Computational Photography: Image Signal Processor I

Quentin Bammey

The Image Processing Pipeline

Image Signal Processor (ISP)

What is an ISP?

  • The :Image Signal Processor (ISP), or image processing pipeline, is the set of methods used to turn the output of a camera sensor into a full-fledged image.
  • There is no “standard” ISP, the pipelines differ depending on the device, processing software, …
    • Different steps and possible orders between the steps
    • Different methods for each step

What is in an ISP?

Some steps are common to most ISPs.

  • Analog to Digital conversion
  • Camera Compensation (OECF, pedestal)
  • Demosaicing
  • Denoising
  • Optical correction
  • Colour correction
  • Local and global tone mapping
  • Colour encoding
  • Image enhancement and restoration (denoising, deblurring, super-resolution, sharpening, dehazing, …)
  • Compression

What you will see today

The rest: in future sessions

  • Analog to Digital conversion
  • Camera Compensation (OECF, pedestal)
  • Demosaicing
  • Denoising \(\Rightarrow\) only a little today
  • Compression
  • Optical correction \(\Rightarrow\) a little last week, see imaging courses for more details
  • Colour correction \(\Rightarrow\) next week
  • Local and global tone mapping \(\Rightarrow\) next week
  • Colour encoding \(\Rightarrow\) in 2 weeks
  • Image enhancement and restoration (denoising, deblurring, super-resolution, sharpening, dehazing, …) \(\Rightarrow\) last 2 lectures

Linearization

  • Camera sensors are inherently linear w.r.t. illuminance
  • However, camera electronics can introduce non-linearities in the response
  • Cameras are characterized with this regard with their opto-electronic conversion function (OECF), a possibly non-linear “quantization” function (sometimes incorrectly referred to as radiometric calibration function), that can be adjusted depending on exposure
  • The image is linearized by applying the inverse OECF

OECF for different quantization gains

OECF for different quantization gains

Measuring the OECF

OECF is obtained by capturing a colour chart and calibrating the taken image to the reference.

Linear vs non-linear processing

After linearization, each pixel value behaves \(\simeq\) linearly w. r. t. the electron count.

Methods that require models of physical phenomena, or mimick such phenomena (camera compensation), should operate at this stage.

Throughout the steps, the relation is no longer linear: different steps will break the proportional relation between electron count and pixel value, differentiate the three colour channels, or induce spatial correlations.

Methods that modify how the image should perceptually look like (such as image edition, compression) should operate at this stage.

ISO sensitivity and Exposure Index

  • Exposure Index (EI): camera setting to determine the exposure in response to a light level measurement. Increasing the exposure indexL
    • Increase the analog gain, so the camera can operate with less light
    • but will also increase noise, degrading the image quality
    • causes saturation to happen at lower light levels, even if the sensor itself is not saturated.
  • ISO sensitivity: measures how strongly a sensor responds to light. Higher sensitivity = less light required to capture a good image. Depends on the camera, the aperture, the exposure time and the EI.
  • ISO is often (incorrectly) used to refer to the EI

Analog to digital conversion (ADC)

  • CMOS / CCD sensors store the captured light as an electric charge
  • The ADC reads the electric charge in each sensor/pixel and converts it to a digital value, i. e., a number
  • Some loss of precision occur due to quantization, but the overall precision remains high
  • The pixel value is proportional to the photon count
    • Note, though, that the photon count is not fully accurate due to noise and fixed patterns
  • ADC is not perfect, and can be the source of bias and noise in the image

Pedestal compensation

  • Read noise can cause negative electron counts in the absence of light
  • To avoid negative values in low light, a small offset is added to the values of the pixels: the pedestal, (also called dark offset or bias level)
  • When processing, this pedestal can then be compensated.

Noise and patterns in an image

  • Because cameras are not “perfect”, the scene they capture is not perfectly representative of the pixel
  • Noise is a random fluctuation of the registered value around the actual one
  • Fixed Patterns / Non-Uniformities are fixed offsets of the registered value compared to the actual one, depending on the camera. Sometimes called fixed-pattern noise, but this is not a noise.
  • Fixed Pattern vs noise: if you capture the same scene several times, noise ~ balances itself out, pattern remains.

Photo Response Non-Uniformity (PRNU)

  • Individual imperfections in each sensor
  • Even under uniform illumination, each cell responds with a slightly different voltage level
  • Values are independent for each pixel/sensor, but remain the same in different images taken by the same camera
  • Can be used in forensics to identify the source camera of an image
  • Multiplicative with the signal

Dark Signal Non-Uniformity (DSNU)

  • Sensors have column-based or row-based operations that can introduce bias
  • Depends not only on a camera, but also on thermal settings and thus on exposure time: gets stronger with long exposures
  • Additive with the signal: can mostly be seen on low-light images
  • Excellent article on PRNU and DSNU

DSNU example

DSNU example

Shot noise

  • Shot noise is by far the most important noise source in image acquisition
  • Also called photon noise, counting noise, Poisson noise
  • Due to the discrete nature of noise
  • Can be modeled by a Poisson distribution, with an excellent Gaussian approximation for each intensity level: \[y \sim \mathcal N(x, x)\]

  • Variance of noise is the intensity level itself (\(\sigma^2=x,\quad \sigma=\sqrt x\))
  • NOT a Gaussian distribution overall: only Gaussian when we consider a single intensity level
    • Many denoising methods actually get this wrong and consider a single Gaussian distribution for the global image
  • It is usually enough to model this noise under normal light conditions, as the other sources of noise are negligible wrt shot noise
  • Under low light, shot noise becomes very small, and other noise sources appear

Read noise

  • Noise due to the components of the sensor’s readout process
  • CMOS: Independent for each pixel, CCD: mostly same for all pixels that pass through the same architecture (row or column)
  • Depends on several components, and thus difficult to model accurately
  • Can be skewed (asymmetric)

Dark current

  • Slight electric current flowing through photosensitive devices even in the absence of any light
  • Random generation of electrons and holes
  • Its definition may include the DSNU fixed pattern, which describes the variations in dark current across pixels (hot and cold pixels)
  • Follows a Poisson distribution for the number of thermal-induced electrons produced over a given time interval: \[ D \sim \mathrm{Poisson}(r\cdot t)\]
    • \(r\) the dark current rate (e-/px/sec)
    • \(t\) the exposure time
  • Increases exponentially with temperature

Fixed patterns and noise summary

  • Fixed patterns can be characterized for a given camera (+exposure time for the DSNU) and compensated
  • In most “normal” situations, shot noise is by far predominant
  • In low-light scenarii, read noise and dark current become more important

Noise estimation

It is possible to estimate the noise curves of an image in a single image: the “noise curves” of the image give an insight as to its noise and processing.

Denoising

  • Denoising: trying to remove noise from an image
  • Impossible to do perfectly: noise corrupts the information, which is then lost
  • Most denoising methods rely on:
    • Blurring the image where it is smooth enough (in a constant-like region, averaging the values cancels the noise out), while keeping edges sharp, and/or
    • Aggregating similar patches in an image
  • Learned methods can use patches from a full dataset of images – but risk hallucinating non-existing details
  • Denoising is much easier when an accurate model of the noise is known, early in the ISP: Do not denoise on the processed, png image (or worse, JPEG-compressed) when you have the raw image.

Demosaicing

Only one sensor per pixel = One sampled colour

As each pixel only has one sensor, only one colour is sampled per pixel

As each pixel only has one sensor, only one colour is sampled per pixel

Only one sensor per pixel = One sampled colour

As each pixel only has one sensor, only one colour is sampled per pixel

As each pixel only has one sensor, only one colour is sampled per pixel

Demosaicing

Demosaicing consists in interpolating the missing colours

Demosaicing consists in interpolating the missing colours

Colour Filter Array (CFA)

The CFA is the pattern of filters specifying the sensitivities of each sensor, thus the “colour” sampled by each pixel.

The Bayer CFA is by far the most common

The Bayer CFA is by far the most common

:Many other CFA exist

The difficulty of demosaicing

Demosaicing can introduce artefacts such as aliasing or chromatic artefacts.

Aliasing

  • Nyquist frequency: twice the frequency of the highest-frequency component
  • When signal is sampled at a rate below the Nyquist rate, the image features aliasing
  • Aliasing, in French repliement de spectre (spectrum folding): the spectrum of the image is folded on itself, and the frequencies higher than the Nyquist frequency are folded into lower frequency areas.
  • Particularly visible in textured regions (which have a high frequency), on badly-printed letters : Moiré pattern
  • Frequently created when downsampling an image improperly (to avoid this: use a Gaussian blur to lower the Nyquist frequency to the sampling rate of the downsampled image), taking photos of screens, etc
  • Anti-aliasing filter: filter to limit the frequency before resampling (and thus avoid aliasing), or reduce the effects of already-present aliasing.
  • Aliasing can be created during demosaicing

Moiré pattern created by overlapping two sets of concentric circles, https://commons.wikimedia.org/w/index.php?curid=71694286

Moiré pattern created by overlapping two sets of concentric circles, https://commons.wikimedia.org/w/index.php?curid=71694286

Bilinear demosaicing

  • Simplest existing demosaicing (but creates more artefacts)
  • Each missing colour is interpolated as the average of its immediate neighbours sampled in this colour

Bayer CFA

Bayer CFA

Improving demosaicing

  • Artefacts are particularly strong when averaging across sharp edges
  • \(\Rightarrow\) Using the image gradient, interpolate in the smoothest direction
  • More precision in green, as we have more samples + Green is a good indicator of the overall luminance changes
  • \(\Rightarrow\) Demosaic in green first, then demosaic the other channels, using the demosaiced green as a guide
  • Overall: exploit the shared information between colour channels to overcome the Nyquist frequency limits (as we effectively double the sampling rate and thus limit aliasing)

Hamilton-Adams demosaicing first interpolates the green channel, averaging over the smoothest direction. Buades et al., Self-similarity driven demosaicking, Image Processing On Line, 2011

Hamilton-Adams demosaicing first interpolates the green channel, averaging over the smoothest direction. Buades et al., Self-similarity driven demosaicking, Image Processing On Line, 2011

Iterative demosaicing: Adaptive Residual Interpolation

  • Start from a base demosaicing (such as Hamilton-Adams demosaicing seen before)
  • Iteratively refine the interpolation, using the results of other channels as guides
  • Complex and slow method, but gives the best (non-learning) results

Then came deep learning

  • Handcrafted demosaicing methods use a combination of conditional convolutions
  • CNNs do just that, automatically: Find optimal sets of conditional convolutions to demosaic the image
  • Small CNNs work well, with limited hallucination risk due to their size: seals the fate of handcrafted demosaicing methods?
  • Can be done jointly with denoising

Does demosaicing matter?

  • Goal of demosaicing: preserve the maximum image resolution as much as possible (instead of just merging pixels and downsampling by 2), while avoiding artefact creation
  • Aliasing occurs when the sampling frequency drops below the Nyquist frequency
  • In photography, the Nyquist frequency is determined by the circle of least confusion (see previous lecture on Optics): lenses will not produce perfect pixels but rather small spots
  • If the sampling rate is twice over the Nyquist frequency, even the most simple demosaicing will not create aliasing artefacts
  • In mobile phone cameras:
    • very small lenses \(\Rightarrow\) relatively large circle of least confusions
    • trends towards higher and higher resolutions \(\Rightarrow\) increased sampling frequency
    • \(\Rightarrow\) we often do not need complex demosaicing, the sampling frequency is more than large enough
    • We actually cannot make real use of the high resolution, and it is usually impossible to produce sharp full-resolution details
  • Demosaicing remains relevant where the lenses are larger, higher-quality (and thus have smaller circles of least confusion), and/or when the resolution is too small.
  • The lens blurriness acts as a natural anti-aliasing filter (but also limits the resolution)