Multimedia forensics

Multimedia forensics

How to assess the validity of an image as a proof to its content? Photographic images used to be considered the most reliable evidence possible, as they were difficult to realistically modify. With the proliferation of digital photography and the development of sophisticated image editing tools, this status of absolute proof is unfortunately long gone. It is increasingly easier to alter an image, not only to make it more aesthetically appealing, but also to change its semantic content and give it a different meaning than the truth.

In the fight against disinformation, the role of image forensics was thus to analyse whether an image was authentic or had been maliciously and locally altered to hide or distort the truth. However, a new source of disinformation has now appeared. Thanks to the advent of diffusion models, it is now possible and easy to generate images from scratch by simply describing the intended target. This progress brings the risk of people pretending the synthetic images they created is in fact an actual photography representing a real scene, for instance to incriminate or ridicule someone or more globally spread disinformation. In this context, being able to detect an image has been generated or modified by AI is more important than ever.

Context

My and my team’s work in forensics is done as part of several projects in collaboration with other universities and institutes:

Important principles of my image forensics research

A contrario analysis

Many forgery detection methods output not a binary decision, but a heatmap showing which regions are more likely than others to be forged. The interpretation of these methods results require human expertise, to filter out regions that are detected by the method, but not significantly, and are likely to be false positives. Similarly, AI-generated image detections will often output a score between 0 and 1, which says little about the actual probability of an image being fake.

Reversely to this, I strive in my methods to make use of a contrario analysis as much as possible. Under this paradigm, intermediate results of the method are statistically validated, so as to have mathematical control over the rate of false positives under a background hypothesis. With a contrario analysis, there is thus no need for interpretation, and the binary outputs of our methods have proof value: it is possible to know how infrequently a detection as significant as the one observed could happen by mistake. Thresholding on this value thus enables one to mathematically limit the rate of false positives.

Reproducible research

As explained here, reproducible research is at the heart of my works. I strive to publish demos of most of my methods on the IPOL journal and demo system.

Main achievements in forensics

You can find my complete list of publications on my Google Scholar page.

  • Mosaic analysis: I focused my PhD thesis on mosaic consistency analysis. Cameras only sample one colour per pixel, and the missing colours must be interpolated from neighbours sampled in different colours, in a process known as demosaicing. This interpolation leaves traces, which can be detected to find local forgeries if the traces are locally inconsistent or absent. However, this task is very difficult and was considered almost impossible prior to my thesis. Indeed, demosaicing traces are varied, subtle and easily destroyed by image compression. To reveal and analyse this mosaic despite these difficulties, I invented the concept of positional learning (POLAR), resulting in a CVPR paper (demo available here and code there), which I later refined with the 4Point and Mimic methods. Simply by analysing the subtle demosaicing traces, these methods establish a new state of the art in forgery detection for uncompressed or high-quality images.
  • AI-generated images detection: I invented the Synthbuster method to detect synthetic images. This method, the first to use spectral artefacts to detect images generated by diffusion models, establish the current state of the art in synthetic images detection. I am also adapting my POLAR paradigm to AI-generated images detection, with very promising results.
  • Datasets: I published several forensics datasets. Most notably:
    • The Synthbuster dataset contains 1000 AI-generated images from each of 9 different recent models
    • The Trace database (paper and code) contains images with asemantic forgery traces, to evaluate forensic tools in an explainable way, and understand their strengths, use cases, limits, and complementarities.

© 2023, Quentin Bammey 2023. All rights reserved.