Multimedia forensics
Research | Links:
How to assess the validity of an image as a proof to its content? Photographic images used to be considered the most reliable evidence possible, as they were difficult to realistically modify. With the proliferation of digital photography and the development of sophisticated image editing tools, this status of absolute proof is unfortunately long gone. It is increasingly easier to alter an image, not only to make it more aesthetically appealing, but also to change its semantic content and give it a different meaning than the truth.
In the fight against disinformation, the role of image forensics was thus to analyse whether an image was authentic or had been maliciously and locally altered to hide or distort the truth. However, a new source of disinformation has now appeared. Thanks to the advent of diffusion models, it is now possible and easy to generate images from scratch by simply describing the intended target. This progress brings the risk of people pretending the synthetic images they created is in fact an actual photography representing a real scene, for instance to incriminate or ridicule someone or more globally spread disinformation. In this context, being able to detect an image has been generated or modified by AI is more important than ever.
Context
My and my team’s work in forensics is done as part of several projects in collaboration with other universities and institutes:
- We participated in the ANR/DGA Defals (fr) challenge (2017–2021), to detect forgeries in images, and collaborated with several of the participants to write a book (Bammey, Quentin, Miguel Colom, Thibaud Ehret, Marina Gardella, Rafael Grompone, Jean‐Michel Morel, Tina Nikoukhah, and Denis Perraud. “How to Reconstruct the History of a Digital Image, and of Its Alterations.” Multimedia Security 1: Authentication and Data Hiding (2022): 1-40. Translated from its original French version Quentin Bammey, Miguel Colom, Thibaud Ehret, Marina Gardella, Rafael Grompone, Jean‐Michel Morel, Tina Nikoukhah, Denis Perraud. “Comment reconstruire l’histoire d’une image digitale, et de ses altérations?.” Sécurité multimédia 1 1 (2021): 9.).
- As part of the Envisu4 project (2021), financed by the Fact-Checking Innovation Initiative (Facebook and International Fact-Checking Network), we collaborated with the Agence France Presse and the Centre for Research and Technology Hellas (CERTH) to develop the InVID-WeVerify verification plugin for fact-checkers, called by the Poynter Institute (home of the International Fact-Checking Network) “One of the most powerful tools for spotting misinformation online”.
- We participate in the Horizon Europe Vera.AI(2022–2025) project against disinformation, for which I am the ENS vice coordinator, where we collaborate with many academic institutions and media agencies throughout Europe.
- We coordinate the ANR Apate(2022–2025) project, where we collaborate with the French scientific police as well as academic and industrial partners in France. This project targets the automatic detection of deepfakes and AI-generated images.
- We are also members of the external advisory board of the European Identity Theft observation System (EITHOS).
Important principles of my image forensics research
A contrario analysis
Many forgery detection methods output not a binary decision, but a heatmap showing which regions are more likely than others to be forged. The interpretation of these methods results require human expertise, to filter out regions that are detected by the method, but not significantly, and are likely to be false positives. Similarly, AI-generated image detections will often output a score between 0 and 1, which says little about the actual probability of an image being fake.
Reversely to this, I strive in my methods to make use of a contrario analysis as much as possible. Under this paradigm, intermediate results of the method are statistically validated, so as to have mathematical control over the rate of false positives under a background hypothesis. With a contrario analysis, there is thus no need for interpretation, and the binary outputs of our methods have proof value: it is possible to know how infrequently a detection as significant as the one observed could happen by mistake. Thresholding on this value thus enables one to mathematically limit the rate of false positives.
Reproducible research
As explained here, reproducible research is at the heart of my works. I strive to publish demos of most of my methods on the IPOL journal and demo system.
Main achievements in forensics
You can find my complete list of publications on my Google Scholar page.
- Mosaic analysis: I focused my PhD thesis on mosaic consistency analysis. Cameras only sample one colour per pixel, and the missing colours must be interpolated from neighbours sampled in different colours, in a process known as demosaicing. This interpolation leaves traces, which can be detected to find local forgeries if the traces are locally inconsistent or absent. However, this task is very difficult and was considered almost impossible prior to my thesis. Indeed, demosaicing traces are varied, subtle and easily destroyed by image compression. To reveal and analyse this mosaic despite these difficulties, I invented the concept of positional learning (POLAR), resulting in a CVPR paper (demo available here and code there), which I later refined with the 4Point and Mimic methods. Simply by analysing the subtle demosaicing traces, these methods establish a new state of the art in forgery detection for uncompressed or high-quality images.
- AI-generated images detection: I invented the Synthbuster method to detect synthetic images. This method, the first to use spectral artefacts to detect images generated by diffusion models, establish the current state of the art in synthetic images detection. I am also adapting my POLAR paradigm to AI-generated images detection, with very promising results.
- Datasets: I published several forensics datasets. Most notably:
- The Synthbuster dataset contains 1000 AI-generated images from each of 9 different recent models
- The Trace database (paper and code) contains images with asemantic forgery traces, to evaluate forensic tools in an explainable way, and understand their strengths, use cases, limits, and complementarities.