← ブログ

How Browsers Actually Handle Your Images

Most image optimisation advice treats delivery as a download problem: smaller file, faster page. Performance audits reinforce this because they measure bytes on the wire. But once the bytes arrive, the browser still has to decode the file into a raw bitmap, allocate memory for it, upload it to the GPU as a texture, and composite it to the screen. Those stages have their own costs, and they don't track file size.

The Memory Math

When a browser finishes downloading a JPEG, it can't display compressed data. It has to decode it into a raw RGBA bitmap using four bytes per pixel for red, green, blue, and alpha. You can think of a compressed image like a piece of flat-packed furniture; it's small and easy to transport, but it still takes up its full physical footprint once you build it in your living room. A 2000x1500 image becomes a specific memory load regardless of how aggressively you compressed it:

2000 x 1500 x 4 bytes = 12,000,000 bytes = ~11.4 MB

That's 11.4 MB of memory for an image that might only be 200 KB on the wire. A 4000x3000 photo from a decent phone camera is even more demanding, requiring 48 MB of decoded bitmap in RAM. The 2 MB file on disk is misleading because the browser needs the full 48 MB to actually render that image.

This is why a page with a handful of poorly sized hero images can push a mobile device's memory to the edge even if DevTools says the total transfer size was reasonable. Transfer size stops mattering once the bytes arrive. Pixel count is what truly matters.

This hero image at 4000×3000 pixels decodes to ~48 MB in memory, even if the compressed JPEG is only 2 MB on the wire

A product page with six lifestyle photos might total about 3 MB in downloads if they're 4000-pixel-wide JPEGs scaled down in CSS. However, the decoded bitmaps come to almost 288 MB. A tab like that can easily crash a mid-range Android phone. It only tests fine on desktop because bigger machines have more memory to absorb the waste.

Content Negotiation

Before any pixels move, there's an important step at the HTTP level. The browser sends a request with an Accept header that most developers never examine.

That's content negotiation. The browser tells the server it would prefer AVIF, then WebP, and then whatever else is available. This process is often handled by a Content Delivery Network (CDN), a distributed group of servers that stores and delivers content from locations closer to the user to improve speed. If the CDN is configured correctly, it reads that header and serves the best format the browser supports. If the server isn't smart, which is true for most default configurations, it ignores the header entirely and sends the same JPEG to everyone.

This is why the <picture> element exists. It moves the format decision to the HTML and works around servers that don't negotiate. But there's a tradeoff: the browser's preload scanner can't always predict which <source> it'll need, which introduces timing delays before the download even starts. You get format flexibility at the cost of a slightly less predictable preload path.

The Format Tradeoff

The format you choose also affects what happens after the download:

The Download Queue

Once the server responds, image bytes arrive according to a priority system, not in simple sequential order. Images in the viewport get higher priority, while those below the fold are deprioritised. loading="lazy" prevents a request entirely until the user scrolls near the image, and fetchpriority="high" lets a hero image compete for bandwidth with scripts, stylesheets, and fonts instead of queueing behind them.

Per-request overhead compounds too. Five larger images totalling the same byte count will often load faster than fifty small ones because there are fewer connections to schedule and less per-image decode setup.

The GPU Handoff

This stage of the pipeline remains largely invisible. It is easy to overlook the fact that the initial decode process occurs entirely on the CPU. The CPU must transform compressed bytes into a bitmap because GPUs are not designed for JPEG, WebP, or AVIF decompression. While some mobile chips have specific hardware decoders to assist, they are the exception. Once the bitmap exists, the browser uploads it to GPU memory as a texture. From that point, the GPU handles the compositing, transforms, and painting.

This process sounds efficient, but GPU memory is a finite resource. On mobile devices, the browser must constantly decide which images deserve to keep their GPU textures and which must be evicted to free up space. Large off-screen images are often the first to lose their textures. When you scroll back to those images, the browser is forced to re-upload the data. In some cases, it must even re-decode the image before the re-upload can happen. The momentary flicker you see when scrolling quickly on a heavy page is the result of that texture eviction and re-uploading happening in real time.

This inefficiency connects back to the memory math. If you use a 4000x3000 image but display it at 400x300 via CSS, the browser still decodes the file at full resolution first. That creates a 48 MB GPU texture for an element that could have functioned with only 480 KB at the correct dimensions. The browser will eventually downsample the image, but the performance damage is already done through the initial memory spike and decode latency.

Lazy Loading Is Not Free

While loading="lazy" defers image loading until a user scrolls near them, this is not a cost-free performance gain. You are not simply delaying a download. You are pushing the entire pipeline - request, download, decode, GPU upload, and compositing - into the scroll event. This is the worst possible moment for heavy computational work because the user is actively monitoring the interface. Any processing lag becomes immediately visible.

Page speed report with rendering filmstrip

Chrome attempts to mitigate this through decode on demand, an optimization where the browser delays the CPU-intensive task of unfolding an image into a bitmap until the last possible moment. In theory, this uses idle time to prepare images just before they enter the viewport. In practice, true idle time is non-existent on a page with active scrolling, JavaScript execution, and CSS animations. The browser must operate within these rigid constraints, which often results in visible pop-in as the hardware struggles to render the asset before the user reaches it.

Lazy loading every image on a page is a common mistake. Images above the fold must load eagerly so their decode work is distributed across the initial page load rather than being crammed into a single scroll event. Lazy loading is a tool intended for content the user might never see. When applied to content the user is guaranteed to see, it just degrades the experience.

Compositing and Layer Costs

Once an image is decoded and uploaded, the browser places it into a layer tree. If an image uses CSS transforms, opacity animations, or sits inside a scrolling container, the browser often promotes it to its own compositor layer. This is generally good for animation because the GPU can move these layers around without needing to repaint the entire screen. However, every single layer has a specific memory cost.

A page with dozens of images sitting on their own compositor layers can consume a massive amount of GPU memory just for management. In a standard image gallery, the Layers panel often shows a huge stack of individual layers. Each one holds an image texture and each one drains resources.

Lighthouse audits rarely warn you about this overhead. You usually only find the problem by opening the Chrome Layers panel (DevTools -> More Tools -> Layers). Looking there, you might realize a simple-looking gallery is really using 400 MB of GPU memory, a footprint that far exceeds the file sizes of the images themselves.

Fifty Images vs. Five

Each image request carries fixed costs regardless of size. These include DNS resolution, connection overhead, request headers, server processing, and response parsing. While HTTP/2 helps by sharing a single connection, the overhead remains present. For tiny images like icons, UI elements, or small thumbnails, these fixed costs can exceed the time spent actually transferring the image data.

There is also a specific decode overhead for every image. Each file must go through the full pipeline. This means the browser must parse the header, set up the decoder, allocate a bitmap buffer, perform the decode, and create a GPU texture. For a 2 KB icon, the setup and teardown represent a significant percentage of the total work. For a 200 KB photo, that same setup time is negligible.

This does not mean you should bundle everything into one giant sprite sheet. However, it does prove that image sprites are not just a relic of the HTTP/1.1 era. Inlining very small images as data URIs or using SVG icon systems can provide meaningful performance wins. These choices are functional optimizations rather than just organizational preferences.

Optimization Beyond File Size

Most optimization advice focuses entirely on making the file smaller. While reducing file size is often the single biggest win, it only addresses one stage of a pipeline containing half a dozen expensive steps. True optimization requires looking at the entire chain.

The following principles can help:

The full optimization chain includes file size, decode time, memory allocation, GPU upload, and compositing. Optimizing the download provides little benefit if you are still forcing 4000px images into 400px containers, lazy-loading content above the fold, or stacking dozens of expensive compositor layers.

If your images are small but the page performance is still poor, the issue is likely happening after the download. Use the Chrome DevTools Performance panel to audit your decode and composite times. Solving these hidden costs is often the difference between a page that looks fast and a page that actually feels fast.