The Challenges of WebGL

WebGL lets you run shaders in the browser — and that's genuinely magical. But using it for arbitrary GPU computation quickly reveals a thicket of constraints that OpenCL and WebGPU were designed to escape.

March 4, 2026·12 min read

WebGL was designed to bring the OpenGL ES 2.0 API to the browser — and it largely succeeded. Millions of browser-based games, data visualizations, and machine-learning demos run on top of it. Yet the moment you try to use WebGL for general-purpose GPU computation (GPGPU), you run into a wall of constraints that feel almost deliberately hostile. This post catalogues those constraints, shows real workarounds, and contrasts WebGL with the APIs that have fewer of them.

First: a taste of what the GPU can do

Before talking about limitations, let's appreciate the power. The entire Mandelbrot set — computed for every pixel in parallel — is trivially expressible in a fragment shader. Every pixel is an independent evaluation of the recurrence $z_{n+1} = z^p + c$ , making it a perfect embarrassingly-parallel workload. The visualizer below is a pure WebGL fragment shader running in your browser; nothing touches the CPU after the initial draw call.

Mandelbrot / Multibrot Explorer

z_n+1 = cmul(z, z) + c

WebGL fragment shader

Iteration formula — GLSL expression for z_n+1; variables: z, c, T (seconds); helper: cmul(a,b)

Animated presets (uses T):

Color scheme

Max iterations: 256

64 (fast)1024 (deep zoom)

Drag to pan · Scroll/pinch to zoom

Change the exponent and you explore the Multibrot family. The colour is computed using the smooth iteration-count formula (Linas' method), which eliminates the banding you get from integer escape counts:

\nu = \log\!\left(\frac{\log\|z_n\|}{\log 2}\right) / \log p

The fragment shader is short — about 80 lines of GLSL. Hit View fragment shader above to read it. That conciseness is WebGL at its best: a single shader replaces thousands of JavaScript iterations.

The shortcomings

1. Floating-point textures are not universally supported

Reading back floating-point results — the bread and butter of any GPGPU pipeline — requires floating-point textures. In WebGL 1, OES_texture_float is an optional extension. On many mobile GPUs and low-end integrated chips it is simply absent. Even where it exists, rendering to a float texture (the step you need to chain render passes) requires a second extension: EXT_color_buffer_float (WebGL 2) or OES_texture_float + WEBGL_color_buffer_float in WebGL 1.

The polyfill strategy is to pack four 8-bit RGBA channels into one 32-bit float via the IEEE 754 bit layout:

pack_float.glsl

// Encode a float in [0, 1) into an RGBA8 texel
vec4 packFloat(float v) {
  vec4 enc = vec4(1.0, 255.0, 65025.0, 16581375.0) * v;
  enc = fract(enc);
  enc -= enc.yzww * vec4(1.0/255.0, 1.0/255.0, 1.0/255.0, 0.0);
  return enc;
}
 
// Decode an RGBA8 texel back to a float
float unpackFloat(vec4 rgba) {
  return dot(rgba, vec4(1.0, 1.0/255.0, 1.0/65025.0, 1.0/16581375.0));
}

float-texture-polyfill.ts

function createFloatTarget(
  gl: WebGLRenderingContext,
  width: number,
  height: number,
): { fbo: WebGLFramebuffer; tex: WebGLTexture; isNative: boolean } {
  const hasFloat = gl.getExtension('OES_texture_float')
  const hasFloatFBO =
    hasFloat && gl.getExtension('WEBGL_color_buffer_float')
 
  const tex = gl.createTexture()!
  gl.bindTexture(gl.TEXTURE_2D, tex)
 
  if (hasFloatFBO) {
    // Native path: single RGBA32F texel per pixel
    gl.texImage2D(
      gl.TEXTURE_2D, 0, gl.RGBA,
      width, height, 0,
      gl.RGBA, gl.FLOAT, null,
    )
  } else {
    // Fallback: RGBA8 — caller must use packFloat / unpackFloat in shaders
    gl.texImage2D(
      gl.TEXTURE_2D, 0, gl.RGBA,
      width, height, 0,
      gl.RGBA, gl.UNSIGNED_BYTE, null,
    )
  }
 
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST)
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST)
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE)
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE)
 
  const fbo = gl.createFramebuffer()!
  gl.bindFramebuffer(gl.FRAMEBUFFER, fbo)
  gl.framebufferTexture2D(
    gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex, 0,
  )
 
  const status = gl.checkFramebufferStatus(gl.FRAMEBUFFER)
  if (status !== gl.FRAMEBUFFER_COMPLETE) {
    throw new Error(`Framebuffer incomplete: 0x${status.toString(16)}`)
  }
 
  return { fbo, tex, isNative: !!hasFloatFBO }
}

The four-channel pack only gives you ~24 bits of mantissa (three 8-bit channels after the bias byte), which is not full float32 precision. For results requiring more than ~6 significant decimal digits you either need OES_texture_float or must scale and offset values into the representable range.

WebGL 2 improves the situation significantly — EXT_color_buffer_float is available on virtually all WebGL 2-capable devices — but WebGL 1 remains widely deployed and the fallback path is non-trivial to test.

2. The GPU context can be lost without warning

Browsers impose a watchdog timer on GPU work. If a single draw call takes too long — usually somewhere between 1 and 10 seconds depending on the browser and OS — the driver kills the context to keep the OS responsive. You receive a webglcontextlost event and all GPU state is destroyed.

This is not just a nuisance: it fundamentally limits the amount of computation you can pack into a single draw call. The usual mitigation is to split work across multiple frames, surrendering control back to the browser event loop between slices:

chunked-compute.ts

class ChunkedGPUJob {
  private rowOffset = 0
 
  constructor(
    private gl: WebGLRenderingContext,
    private totalRows: number,
    private rowsPerFrame: number,
    private onComplete: (result: Float32Array) => void,
  ) {
    this.gl.canvas.addEventListener('webglcontextlost', this.onLost)
    this.gl.canvas.addEventListener('webglcontextrestored', this.onRestored)
  }
 
  start() { requestAnimationFrame(() => this.tick()) }
 
  private tick = () => {
    const gl = this.gl
    // Dispatch only `rowsPerFrame` rows this frame
    this.dispatchRows(this.rowOffset, this.rowsPerFrame)
    this.rowOffset += this.rowsPerFrame
 
    if (this.rowOffset < this.totalRows) {
      requestAnimationFrame(this.tick)
    } else {
      this.onComplete(this.readback())
    }
  }
 
  private onLost = (e: Event) => {
    e.preventDefault() // allow restore
    console.warn('WebGL context lost — will retry after restore')
    this.rowOffset = 0  // restart from beginning
  }
 
  private onRestored = () => {
    this.rebuildShaders()
    this.start()
  }
 
  // ... dispatchRows, readback, rebuildShaders omitted for brevity
}

The rule of thumb: keep each dispatch under ~16 ms (one frame budget). For large workloads this means maintaining a job queue and accumulating partial results across frames — adding significant complexity to what would be a single kernel launch in CUDA or OpenCL.

3. Buffer management requires solving the graph-colouring problem

A non-trivial WebGL pipeline is not a single shader — it is a sequence of programs, each of which reads from one or more textures and writes its result into another. A deferred renderer, for example, might chain eight or more passes: geometry → shadow map → ambient occlusion → light accumulation → bloom extraction → blur → tone-map → output. Before each program runs, its input textures must already be populated; after it finishes, its output texture becomes an input to the next stage. You are continuously loading data into buffers to feed one program, then draining results out of buffers to feed the next.

This is the ping-pong buffering model: you read from texture A and write to texture B, then swap. As the pipeline grows, the number of intermediate textures multiplies. Naïvely allocating a new WebGLTexture per logical buffer wastes GPU memory and triggers expensive re-allocations.

The visualiser below shows exactly this for a seven-pass deferred pipeline. Each row is a virtual buffer; each column is a shader program. The coloured bar is the buffer's live range — the window of passes during which it must stay resident in GPU memory. Hover a column to see which buffers are simultaneously alive; hover a buffer name to see every buffer it conflicts with.

Shader Pipeline — Buffer Live Ranges

Each row is a virtual buffer. Each column is a shader program (pass). W = written, R = read. For simplicity, each output buffer is named after the pass that produces it. The coloured bar shows how long the buffer must stay alive in GPU memory. Buffers that overlap in the same column conflict and need separate physical slots. Hover a column or a buffer name to see conflicts highlighted.

buffer

GBuffer

1 alive

Shadow

2 alive

3 alive

Lighting

4 alive

Bloom

2 alive

DOF

3 alive

Tonemap

2 alive

slot

GBuffer

Slot B

Shadows

Slot C

Lighting

Slot A

Bloom

Slot B

DOF

Slot B

Slot A (indigo) · Slot B (orange) · Slot C (green) — 3 physical GPU textures serve all 6 virtual buffers.

The key observation is that two passes do not always need their buffers at the same time. If the shadow-map pass finishes before the bloom-extraction pass starts, the shadow-map texture is no longer needed and its GPU memory can be handed to bloom. More generally:

Two virtual buffers can share the same physical texture if and only if their live ranges do not overlap.

Deciding which buffers can share — and therefore computing the minimum number of physical textures needed — is exactly the graph-colouring problem:

Nodes = virtual buffers (one per render target, one per pass output)
Edges = "these two buffers must be alive at the same time" (i.e., both are needed as inputs to at least one of the same passes)
Colours = physical GPU texture slots

To build the graph concretely: for each shader program in the pipeline, identify every texture it reads. All of those input textures, plus the output texture being written, must coexist in memory while that program executes. Draw edges between every pair of buffers that share a program in this way — they conflict and cannot be assigned the same physical slot.

Finding the chromatic number is NP-hard in general, but the Welch-Powell greedy algorithm gives an excellent upper bound in linear time:

Sort nodes by degree (number of conflicts) in descending order.
Greedily assign the lowest colour not already used by any neighbour.

The playground below models a real multi-pass deferred-rendering pipeline. Modify the conflict graph and watch the slot count change in real time:

Buffer Allocation — Welch-Powell Graph Colouring

Each node is a virtual GPU buffer. An edge means both buffers are simultaneously alive — they cannot share a physical slot. Colours = physical memory slots.

Badge = degree (number of live-overlapping buffers). Same colour = can share physical memory.

Buffers

Conflicts

Slots needed

Processing order (by degree ↓)

1.Lightingdeg 5

2.GBufferdeg 3

3.Shadowsdeg 2

4.AOdeg 2

5.Bloomdeg 2

6.DOFdeg 2

7.Tonemapdeg 2

Physical buffer slots

Slot ASlot BSlot C

Insight: The chromatic number (3) is the minimum number of physical GPU textures/buffers this pipeline needs, regardless of how many virtual buffers it declares. Welch-Powell gives a fast greedy upper bound — optimal colouring is NP-hard in general but excellent heuristics exist for the sparse conflict graphs typical of real render pipelines.

For the typical sparse conflict graphs of real render pipelines (most passes only overlap with their immediate neighbours), Welch-Powell often achieves the true minimum. The naive allocation would use one texture per pass; the coloured allocation for the default pipeline above uses only three physical slots.

4. Random access is fundamentally limited

In the fragment shader model, the GPU writes to a specific output location determined by which fragment it is processing — you write pixel $(x, y)$ when you are invoked for pixel $(x, y)$ . This is a scatter operation, but WebGL only supports gather (reading from any texel) and fixed scatter (writing to the current fragment's position).

True scatter — writing to an arbitrary texel from within a shader — is not possible in WebGL. Algorithms like:

Histogram computation (scatter counts into bins)
Radix sort (scatter elements to sorted positions)
Sparse matrix-vector products (scatter partial products by row index)

…all require the kind of arbitrary write access that only becomes available with Shader Storage Buffer Objects (SSBOs) or Image Store operations, neither of which exists in WebGL. The standard workaround is to restructure the algorithm into a sequence of gather-only passes (e.g., prefix-sum networks for histogram normalisation), which is often possible but dramatically increases the number of render passes and development complexity.

Read access is also constrained: sampling a texture with texture2D is non-uniform random access, which causes GPU warp/wave divergence and can stall execution significantly when neighbouring threads read from very different memory addresses. Texture caches are optimised for spatial locality; algorithms with irregular access patterns (graph traversal, sparse lookups) pay a steep penalty.

5. No geometry shaders

OpenGL 4.1 introduced geometry shaders — a programmable stage between the vertex shader and the rasteriser that can emit new primitives, amplify geometry, or output to multiple render targets simultaneously. They enable:

Rendering to cubemap faces in a single draw call
Particle systems that expand a point into a billboard quad
Shadow map generation for all six faces of a point light in one pass
Layered rendering for VR (one draw call per eye without geometry duplication)

WebGL (based on OpenGL ES 2.0 / 3.0) simply does not have this stage. Every technique that relies on geometry shaders must be reimplemented via workarounds — typically instanced rendering with per-instance data in textures, which works but adds overhead and complexity.

WebGL 2 adds gl_Layer support via the OVR_multiview2 extension for the VR use-case, but general geometry shader functionality remains absent.

Contrast with OpenCL and WebGPU

Feature	WebGL 1/2	OpenCL 1.2+	WebGPU
Compute shaders	✗ (fragment only)	✓ kernels	✓ compute pipelines
Arbitrary write (scatter)	✗	✓	✓ (storage buffers)
Float32 render targets	Extension, patchy	✓ native	✓ native
Context timeout protection	Watchdog, no control	Configurable	Configurable
Geometry shaders	✗	N/A (compute model)	✗ (by design)
Atomic operations	✗	✓	✓
Shared memory (workgroup)	✗	✓ local memory	✓ workgroup storage

OpenCL sidesteps all of these constraints by abandoning the graphics pipeline model entirely. There are no draw calls, no rasterisation, no fragment stages. A kernel is a plain C-like function that reads and writes arbitrary memory. The problem is that OpenCL is not available in browsers (and was deprecated on macOS), limiting it to native applications.

WebGPU is the browser API designed to fix WebGL's GPGPU shortcomings while remaining web-safe. It introduces:

Compute pipelines with proper @compute shader stages in WGSL
Storage buffers (var<storage, read_write>) allowing arbitrary scatter from any shader stage
Workgroup shared memory for fast inter-thread communication within a tile
Atomic operations for lock-free histograms and reductions
Explicit resource lifetimes and a queue model that gives you predictable timing without watchdog surprises

The trade-off is that WebGPU's API surface is significantly larger and more verbose. A WebGL "hello triangle" is ~50 lines; the WebGPU equivalent is closer to 200. But for serious GPGPU work — ML inference, physics simulation, large-scale data processing — that verbosity buys you the control you need.

webgpu-compute-example.ts

// WebGPU compute: scatter a histogram into a storage buffer
// — impossible to express cleanly in WebGL
 
const shader = device.createShaderModule({ code: `
  @group(0) @binding(0) var<storage, read>       data    : array<u32>;
  @group(0) @binding(1) var<storage, read_write>  hist    : array<atomic<u32>>;
 
  @compute @workgroup_size(256)
  fn main(@builtin(global_invocation_id) id : vec3<u32>) {
    let value = data[id.x];
    atomicAdd(&hist[value % 256u], 1u);   // arbitrary scatter ✓
  }
` })

The one area where WebGL still wins is compatibility: WebGL 1 runs on essentially every GPU made since 2010, including devices that will never support WebGPU. For rendering, WebGL 2 covers the vast majority of current hardware. For GPGPU in 2026, WebGPU is the right tool — but the ecosystem, tooling, and learning resources are still maturing compared to WebGL's decade-long head start.

Summary

WebGL is a remarkable achievement for what it is: OpenGL ES in a sandboxed, cross-origin-safe browser context. Its rendering capabilities are well-served by the fragment shader model. But the moment you reach for it as a general compute substrate you encounter floating-point texture portability gaps, watchdog timeouts that limit computation granularity, a buffer allocation problem that reduces to graph colouring, no support for scatter writes or geometry amplification, and the absence of compute primitives like atomics and shared memory.

The industry has been steadily moving toward WebGPU precisely because these constraints are not fixable within the WebGL model — they are architectural. WebGL will remain the right choice for compatibility-sensitive rendering; WebGPU is where serious GPU computing in the browser is headed.

◆

Mar 2, 2026·9 min read

The Abelian Sandpile: A Group Hiding in a Pile of Sand

The Bak–Tang–Wiesenfeld sandpile model (ASM) is a deceptively simple cellular automaton that conceals a rich algebraic structure — an abelian group whose identity element is a non-trivial fractal-like configuration.

math algebra cellular-automata

⬡

Feb 28, 2026·10 min read

Raft & Viewstamped Replication: Consensus Visualized

An interactive deep-dive into two landmark distributed consensus algorithms — Raft and Viewstamped Replication — with live simulations you can explore and break.

distributed-systems consensus raft

◎

Feb 28, 2026·9 min read

From Regex to Automata to Generators

How regular expressions compile to deterministic finite automata — with an interactive visualizer, a DFA-powered string generator, and a note on closing the loop with property-based testing.

algorithms automata typescript

The Challenges of WebGL

First: a taste of what the GPU can do

The shortcomings

1. Floating-point textures are not universally supported

2. The GPU context can be lost without warning

3. Buffer management requires solving the graph-colouring problem

4. Random access is fundamentally limited

5. No geometry shaders

Contrast with OpenCL and WebGPU

Summary

Related Articles

The Abelian Sandpile: A Group Hiding in a Pile of Sand

Raft & Viewstamped Replication: Consensus Visualized

From Regex to Automata to Generators