Through our devices, we are able to record and share content, fomenting the generation of information about what happens around us. An example is a massive blaze that struck Notre Dame Cathedral, in April 2019, during which millions of images and videos were recorded and simultaneously shared on the Internet by the media at large and everyday citizens. All generated content is potentially important to document our history and also as a source of information for investigations. However, in this vast pool of information, we do not promptly know how the different items connect to each other to unravel the whole story. This chapter presents how to achieve the X-coherence for an event, i.e., sort an unstructured collection of images and videos, in space and time, allowing the analysis of the corresponding event as a whole. To this end, we present machine learning techniques to automatically position each media piece in space and time, considering the Notre Dame fire as the backdrop. We employ convolutional neural networks to capture visual clues in the Cathedral’s structure and its surroundings and how the scene’s appearance alters as time progresses. Finally, this chapter also presents a comprehensive way of visualizing and exploring the organized data.