With the diffusion of smartphones and social media, the capturing and sharing of modern events — from social gatherings to terrorist attempts — can now be made by any citizen with unprecedented speed and reach. This grants the power of content generation virtually to all of us, spawning a huge flow of information that, combined with other multimedia sources, can potentially aid in the forensic investigation of such events. In cases like this, although several of such images might have timestamps embedded into their metadata, they are not always correct — due to misregistration or even alteration — nor trustworthy to infer time information. Nonetheless, in a forensic investigation, it is still indispensable to mine temporal information to organize the flood of information, allowing us to fully understand, reconstruct and fact-check data surrounding that event. Even though time has a single direction, its passage can be perceived in different ways. In this research, we investigate the visual pieces of evidence used to identify the flow of time between two moments of the same scene and how they are affected by the scene’s semantic, the elements present in it and the amount of time that has passed between those moments. We propose a deep-learning method to sort images and videos in time and apply it to a real-world event: the Grenfell Tower Fire, an unprecedented fire that broke out in a 24-story residential building in London. By analyzing changes in illumination (due to Sun’s movement in the sky) as well as how the appearance of the building’s facade evolved throughout the fire, we are able to estimate a one-hour window in which an image was captured and create a timeline of the event.