Positional Learning

Positional Learning

I invented POsitional LeARning (Polar), a novel training scheme for CNNs that implicitly reveals underlying spatial or spectral information and can highlight its inconsistencies. Positional learning has already been applied successfully to AI-generated image detection, and mosaic inconsistency detection, for image forensics.

What happens when you train a fully-convolutional neural network (CNN) to detect positional information in images or signals? Positional Learning (POLAR) consists in doing exactly that.

Imagine you train a CNN on a large quantity of signals to output, for any given signal, whether each sample is at an even or odd position. This is a form of self-supervised training, as we know the position of each sample. Basically, we train the CNN to output an array of alternating zeroes and ones.

While the target output is obvious to us, the CNN is translation-invariant and thus has no inherent knowledge whether each sample is at an odd or even position. On a random signal, there is no information to which the network can grasp to detect the parity of a pixel position. Thus, the network will not be able anything, and will end up producing random-like results, rather than the expected alternating pattern of zeroes and ones.

On the other hand, suppose this time, all your signals have been upsampled by a factor 2. This means that between each original sample in a signal, you introduce a new sample by interpolating its neighbours. Now, samples at even positions are original, whereas samples at odd positions have been interpolated. Thus, if the CNN can learn to distinguish interpolated from originally-sampled pixels, it will be able to infer the target result from this analysis.

After training the CNN on these signals, if you use it on a similarly-upsampled signal, it will thus be able to infer the evenness of the samples positions and produce an alternating pattern of zeroes and ones as a result. However, if you use it on a signal that had not been resampled, the knowledge of the network will be of no use, and the output will be random-like again.

In the end, under guise of having trained a network to detect positional information on each sample, we have in fact trained our network to reveal whether each pixel is original or interpolated, by further analysing the network output we can then infer whether the signal itself was resampled.

Overall, positional learning is great at revealing underlying components of a signal that contain components which vary according to the position, such as periodic components. This is why it is seeing many uses in forensics, as underlying noise pattern of images possess many important frequency components, whose analysis can highlight inconsistencies proving forgeries.


© 2023, Quentin Bammey 2023. All rights reserved.