← Blog

"Auto" colour count — how SSIM picks the smallest palette that still looks right

When you pick "auto" colour count for a PNG or GIF, the encoder needs to choose how many palette entries to keep. Use too few and visible banding appears in gradients; use too many and you waste bytes on colours nobody can perceive. The right answer depends on the image. There's no single number that works for every file.

Our pipeline solves it by binary search: try a colour count, measure how close the quantized version is to the original using a perceptual metric, and narrow until we find the smallest count that's still indistinguishable from the original. The metric we use is multi-scale SSIM, with a threshold of 0.9985.

SSIM — measuring "do these images look the same"

The Structural Similarity Index Measure is a perceptual quality metric. Unlike pixel-difference metrics (mean squared error, PSNR), it doesn't ask "are these pixels numerically identical" — it asks "does a human perceive these images as the same". Two scores stand out:

SSIM = 1.0 — bit-identical images. Perfectly preserved.
SSIM ≥ 0.998 — differences are below the threshold of normal human perception. Image looks identical at any reasonable viewing distance.
SSIM ~0.97 — differences are subtle but visible if you look carefully.
SSIM < 0.95 — differences are obviously visible.

Our threshold of 0.9985 is in the "below perception" zone with a small margin for safety. Compute SSIM between the original and a palette-quantized version of an image, and if the score is above 0.9985, you can use that palette without anyone noticing.

Multi-scale SSIM

Single-scale SSIM looks at the image at one resolution. Multi-scale (MS-SSIM) looks at multiple resolutions — full resolution, half, quarter, eighth, sixteenth — and combines the scores with weights that match how human visual perception works. We use the standard MS-SSIM weight vector [0.0448, 0.2856, 0.3001, 0.2363, 0.1333] across up to 5 scales, downsampling 2× at each step. Larger weights on the middle scales reflect that humans are most sensitive to mid-frequency details.

For each scale we use 8 × 8 blocks with stride 2, and take the worst-block score across the image rather than the average. The reasoning: if a quantization mistake causes one localised region to look wrong (a banded sky in the corner), the average score may stay above threshold while the bad region is clearly visible. Worst-block ensures the worst part of the image is what we're measuring.

The binary search

The search runs:

low  = 32                                # don't go below 32 colours
high = min(256, image's actual colour count)
optimal = high

# First check: does even the maximum palette pass threshold?
maxQuantized = quantize(image, high)
if SSIM(image, maxQuantized) < 0.9985:
    return image's full colour count    # quantization not safe at all

# Binary search
while high − low > 4:
    mid = (low + high) / 2
    quantized = quantize(image, mid)
    if SSIM(image, quantized) ≥ 0.9985:
        optimal = mid
        high = mid
    else:
        low = mid

# Refine: try a few more steps below the optimum
for c in [optimal, optimal − 2, optimal − 4, ...]:
    if SSIM(image, quantize(image, c)) ≥ 0.9985:
        optimal = c
    else:
        break

return optimal

Six or seven iterations converge from the 32–256 range to within 4 colours. Each iteration runs the palette quantizer and an MS-SSIM computation; on a 1920 × 1080 image both are sub-second on a modern machine, so the entire search completes in 3–5 seconds.

Why the floor is 32 colours

Below 32 distinct colours, dithering becomes the dominant feature of the output rather than the colours themselves. Even if SSIM scores stay high (the dithering does diffuse error), the result looks "computery" — a discernible halftone-like pattern visible without zooming. We hard-floor the search at 32 to prevent the auto mode from ever producing that look.

Users who want extreme quantization (a 16-colour or 8-colour palette for retro-style export) can override the floor manually; that's not what auto mode is for.

When the maximum already fails

If the image's content is high-frequency enough that even a 256-colour palette doesn't pass the 0.9985 threshold (a photographic PNG, for example), the search exits immediately and reports the image's actual unique colour count. The encoder then either keeps the file as truecolor PNG or — better — suggests format conversion to JPEG or WebP. Auto colour-count optimisation is for content that fits in a palette; photo-shaped content lives in a different format entirely.

Why this beats fixed-N defaults

Fixed defaults waste bytes on simple images and damage complex ones. A simple icon with 14 actual distinct colours doesn't need a 256-entry palette; quantizing it to the actual 14 colours produces a smaller file with no quality loss. A complex screenshot with subtle anti-aliased text has hundreds of effectively-distinct colours; quantizing it to a fixed 64 visibly damages the text.

Per-image SSIM-bounded search adapts to the content. The output palette size we report — sometimes 48, sometimes 200 — is the smallest count that keeps the image perceptually whole.