We live in a connected society in which no major event – from music concerts to terrorist attempts – happens without being recorded by a smartphone and shared instantly to the world. This flow of generated data is often unstructured and carries no reliable information with respect to time of capture of such media pieces. Consequently, posterior reconstruction, understanding and fact-checking of that event are hindered if the data is not properly organized. In this work, we train a data-driven method to chronologically sort images originated from a real event, the Notre-Dame Cathedral fire, which broke out on April 15th, 2019. Our network leverages visual clues-such as the destruction of the cathedral’s structure or the evolution of the fire-to position an image in time. We investigate several occlusion strategies to improve classification accuracy, generalization and explainability of our method. Besides comparing the performance of each strategy, we evaluate their activation maps, i.e., the important regions considered for classification by each method.