WebGL lets you run shaders in the browser — and that's genuinely magical. But using it for arbitrary GPU computation quickly reveals a thicket of constraints that OpenCL and WebGPU were designed to escape.
WebGL was designed to bring the OpenGL ES 2.0 API to the browser — and it largely succeeded. Millions of browser-based games, data visualizations, and machine-learning demos run on top of it. Yet the moment you try to use WebGL for general-purpose GPU computation (GPGPU), you run into a wall of constraints that feel almost deliberately hostile. This post catalogues those constraints, shows real workarounds, and contrasts WebGL with the APIs that have fewer of them.
First: a taste of what the GPU can do
Before talking about limitations, let's appreciate the power. The entire Mandelbrot set — computed for every pixel in parallel — is trivially expressible in a fragment shader. Every pixel is an independent evaluation of the recurrence , making it a perfect embarrassingly-parallel workload. The visualizer below is a pure WebGL fragment shader running in your browser; nothing touches the CPU after the initial draw call.
Mandelbrot / Multibrot Explorer
zn+1 = cmul(z, z) + c
Animated presets (uses T):
Change the exponent and you explore the Multibrot family. The colour is computed using the smooth iteration-count formula (Linas' method), which eliminates the banding you get from integer escape counts:
The fragment shader is short — about 80 lines of GLSL. Hit View fragment shader above to read it. That conciseness is WebGL at its best: a single shader replaces thousands of JavaScript iterations.
The shortcomings
1. Floating-point textures are not universally supported
Reading back floating-point results — the bread and butter of any GPGPU pipeline — requires floating-point textures. In WebGL 1, OES_texture_float is an optional extension. On many mobile GPUs and low-end integrated chips it is simply absent. Even where it exists, rendering to a float texture (the step you need to chain render passes) requires a second extension: EXT_color_buffer_float (WebGL 2) or OES_texture_float + WEBGL_color_buffer_float in WebGL 1.
The polyfill strategy is to pack four 8-bit RGBA channels into one 32-bit float via the IEEE 754 bit layout:
// Encode a float in [0, 1) into an RGBA8 texel
vec4 packFloat(float v) {
vec4 enc = vec4(1.0, 255.0, 65025.0, 16581375.0) * v;
enc = fract(enc);
enc -= enc.yzww * vec4(1.0/255.0, 1.0/255.0, 1.0/255.0, 0.0);
return enc;
}
// Decode an RGBA8 texel back to a float
float unpackFloat(vec4 rgba) {
return dot(rgba, vec4(1.0, 1.0/255.0, 1.0/65025.0, 1.0/16581375.0));
}function createFloatTarget(
gl: WebGLRenderingContext,
width: number,
height: number,
): { fbo: WebGLFramebuffer; tex: WebGLTexture; isNative: boolean } {
const hasFloat = gl.getExtension('OES_texture_float')
const hasFloatFBO =
hasFloat && gl.getExtension('WEBGL_color_buffer_float')
const tex = gl.createTexture()!
gl.bindTexture(gl.TEXTURE_2D, tex)
if (hasFloatFBO) {
// Native path: single RGBA32F texel per pixel
gl.texImage2D(
gl.TEXTURE_2D, 0, gl.RGBA,
width, height, 0,
gl.RGBA, gl.FLOAT, null,
)
} else {
// Fallback: RGBA8 — caller must use packFloat / unpackFloat in shaders
gl.texImage2D(
gl.TEXTURE_2D, 0, gl.RGBA,
width, height, 0,
gl.RGBA, gl.UNSIGNED_BYTE, null,
)
}
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST)
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST)
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE)
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE)
const fbo = gl.createFramebuffer()!
gl.bindFramebuffer(gl.FRAMEBUFFER, fbo)
gl.framebufferTexture2D(
gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex, 0,
)
const status = gl.checkFramebufferStatus(gl.FRAMEBUFFER)
if (status !== gl.FRAMEBUFFER_COMPLETE) {
throw new Error(`Framebuffer incomplete: 0x${status.toString(16)}`)
}
return { fbo, tex, isNative: !!hasFloatFBO }
}The four-channel pack only gives you ~24 bits of mantissa (three 8-bit channels after the bias byte), which is not full float32 precision. For results requiring more than ~6 significant decimal digits you either need OES_texture_float or must scale and offset values into the representable range.
WebGL 2 improves the situation significantly — EXT_color_buffer_float is available on virtually all WebGL 2-capable devices — but WebGL 1 remains widely deployed and the fallback path is non-trivial to test.
2. The GPU context can be lost without warning
Browsers impose a watchdog timer on GPU work. If a single draw call takes too long — usually somewhere between 1 and 10 seconds depending on the browser and OS — the driver kills the context to keep the OS responsive. You receive a webglcontextlost event and all GPU state is destroyed.
This is not just a nuisance: it fundamentally limits the amount of computation you can pack into a single draw call. The usual mitigation is to split work across multiple frames, surrendering control back to the browser event loop between slices:
class ChunkedGPUJob {
private rowOffset = 0
constructor(
private gl: WebGLRenderingContext,
private totalRows: number,
private rowsPerFrame: number,
private onComplete: (result: Float32Array) => void,
) {
this.gl.canvas.addEventListener('webglcontextlost', this.onLost)
this.gl.canvas.addEventListener('webglcontextrestored', this.onRestored)
}
start() { requestAnimationFrame(() => this.tick()) }
private tick = () => {
const gl = this.gl
// Dispatch only `rowsPerFrame` rows this frame
this.dispatchRows(this.rowOffset, this.rowsPerFrame)
this.rowOffset += this.rowsPerFrame
if (this.rowOffset < this.totalRows) {
requestAnimationFrame(this.tick)
} else {
this.onComplete(this.readback())
}
}
private onLost = (e: Event) => {
e.preventDefault() // allow restore
console.warn('WebGL context lost — will retry after restore')
this.rowOffset = 0 // restart from beginning
}
private onRestored = () => {
this.rebuildShaders()
this.start()
}
// ... dispatchRows, readback, rebuildShaders omitted for brevity
}The rule of thumb: keep each dispatch under ~16 ms (one frame budget). For large workloads this means maintaining a job queue and accumulating partial results across frames — adding significant complexity to what would be a single kernel launch in CUDA or OpenCL.
3. Buffer management requires solving the graph-colouring problem
Every GPGPU pipeline involves ping-pong buffering: you read from texture A and write to texture B, then swap. As pipelines grow — shadow maps, ambient occlusion, lighting accumulation, post-processing — the number of intermediate textures multiplies. Naïvely allocating a new WebGLTexture per logical buffer wastes GPU memory and triggers expensive re-allocations.
The insight from compiler register-allocation theory is that two virtual buffers can share the same physical texture if their lifetimes do not overlap. Determining the minimum number of physical textures is exactly the graph-colouring problem:
- Nodes = virtual buffers (render targets)
- Edges = "these two buffers are live at the same time"
- Colours = physical GPU texture slots
Finding the chromatic number is NP-hard in general, but the Welch-Powell greedy algorithm gives an excellent upper bound in linear time:
- Sort nodes by degree (number of conflicts) in descending order.
- Greedily assign the lowest colour not already used by any neighbour.
The playground below models a real multi-pass deferred-rendering pipeline. Modify the conflict graph and watch the slot count change in real time:
Buffer Allocation — Welch-Powell Graph Colouring
Each node is a virtual GPU buffer. An edge means both buffers are simultaneously alive — they cannot share a physical slot. Colours = physical memory slots.
Badge = degree (number of live-overlapping buffers). Same colour = can share physical memory.
Processing order (by degree ↓)
Physical buffer slots
Insight: The chromatic number (3) is the minimum number of physical GPU textures/buffers this pipeline needs, regardless of how many virtual buffers it declares. Welch-Powell gives a fast greedy upper bound — optimal colouring is NP-hard in general but excellent heuristics exist for the sparse conflict graphs typical of real render pipelines.
For the typical sparse conflict graphs of real render pipelines (most passes only overlap with their immediate neighbours), Welch-Powell often achieves the true minimum. The naive allocation would use one texture per pass; the coloured allocation for the default pipeline above uses only three physical slots.
4. Random access is fundamentally limited
In the fragment shader model, the GPU writes to a specific output location determined by which fragment it is processing — you write pixel when you are invoked for pixel . This is a scatter operation, but WebGL only supports gather (reading from any texel) and fixed scatter (writing to the current fragment's position).
True scatter — writing to an arbitrary texel from within a shader — is not possible in WebGL. Algorithms like:
- Histogram computation (scatter counts into bins)
- Radix sort (scatter elements to sorted positions)
- Sparse matrix-vector products (scatter partial products by row index)
…all require the kind of arbitrary write access that only becomes available with Shader Storage Buffer Objects (SSBOs) or Image Store operations, neither of which exists in WebGL. The standard workaround is to restructure the algorithm into a sequence of gather-only passes (e.g., prefix-sum networks for histogram normalisation), which is often possible but dramatically increases the number of render passes and development complexity.
Read access is also constrained: sampling a texture with texture2D is non-uniform random access, which causes GPU warp/wave divergence and can stall execution significantly when neighbouring threads read from very different memory addresses. Texture caches are optimised for spatial locality; algorithms with irregular access patterns (graph traversal, sparse lookups) pay a steep penalty.
5. No geometry shaders
OpenGL 4.1 introduced geometry shaders — a programmable stage between the vertex shader and the rasteriser that can emit new primitives, amplify geometry, or output to multiple render targets simultaneously. They enable:
- Rendering to cubemap faces in a single draw call
- Particle systems that expand a point into a billboard quad
- Shadow map generation for all six faces of a point light in one pass
- Layered rendering for VR (one draw call per eye without geometry duplication)
WebGL (based on OpenGL ES 2.0 / 3.0) simply does not have this stage. Every technique that relies on geometry shaders must be reimplemented via workarounds — typically instanced rendering with per-instance data in textures, which works but adds overhead and complexity.
WebGL 2 adds gl_Layer support via the OVR_multiview2 extension for the VR use-case, but general geometry shader functionality remains absent.
Contrast with OpenCL and WebGPU
| Feature | WebGL 1/2 | OpenCL 1.2+ | WebGPU |
|---|---|---|---|
| Compute shaders | ✗ (fragment only) | ✓ kernels | ✓ compute pipelines |
| Arbitrary write (scatter) | ✗ | ✓ | ✓ (storage buffers) |
| Float32 render targets | Extension, patchy | ✓ native | ✓ native |
| Context timeout protection | Watchdog, no control | Configurable | Configurable |
| Geometry shaders | ✗ | N/A (compute model) | ✗ (by design) |
| Atomic operations | ✗ | ✓ | ✓ |
| Shared memory (workgroup) | ✗ | ✓ local memory | ✓ workgroup storage |
OpenCL sidesteps all of these constraints by abandoning the graphics pipeline model entirely. There are no draw calls, no rasterisation, no fragment stages. A kernel is a plain C-like function that reads and writes arbitrary memory. The problem is that OpenCL is not available in browsers (and was deprecated on macOS), limiting it to native applications.
WebGPU is the browser API designed to fix WebGL's GPGPU shortcomings while remaining web-safe. It introduces:
- Compute pipelines with proper
@computeshader stages in WGSL - Storage buffers (
var<storage, read_write>) allowing arbitrary scatter from any shader stage - Workgroup shared memory for fast inter-thread communication within a tile
- Atomic operations for lock-free histograms and reductions
- Explicit resource lifetimes and a queue model that gives you predictable timing without watchdog surprises
The trade-off is that WebGPU's API surface is significantly larger and more verbose. A WebGL "hello triangle" is ~50 lines; the WebGPU equivalent is closer to 200. But for serious GPGPU work — ML inference, physics simulation, large-scale data processing — that verbosity buys you the control you need.
// WebGPU compute: scatter a histogram into a storage buffer
// — impossible to express cleanly in WebGL
const shader = device.createShaderModule({ code: `
@group(0) @binding(0) var<storage, read> data : array<u32>;
@group(0) @binding(1) var<storage, read_write> hist : array<atomic<u32>>;
@compute @workgroup_size(256)
fn main(@builtin(global_invocation_id) id : vec3<u32>) {
let value = data[id.x];
atomicAdd(&hist[value % 256u], 1u); // arbitrary scatter ✓
}
` })The one area where WebGL still wins is compatibility: WebGL 1 runs on essentially every GPU made since 2010, including devices that will never support WebGPU. For rendering, WebGL 2 covers the vast majority of current hardware. For GPGPU in 2026, WebGPU is the right tool — but the ecosystem, tooling, and learning resources are still maturing compared to WebGL's decade-long head start.
Summary
WebGL is a remarkable achievement for what it is: OpenGL ES in a sandboxed, cross-origin-safe browser context. Its rendering capabilities are well-served by the fragment shader model. But the moment you reach for it as a general compute substrate you encounter floating-point texture portability gaps, watchdog timeouts that limit computation granularity, a buffer allocation problem that reduces to graph colouring, no support for scatter writes or geometry amplification, and the absence of compute primitives like atomics and shared memory.
The industry has been steadily moving toward WebGPU precisely because these constraints are not fixable within the WebGL model — they are architectural. WebGL will remain the right choice for compatibility-sensitive rendering; WebGPU is where serious GPU computing in the browser is headed.
Related Articles
The Abelian Sandpile: A Group Hiding in a Pile of Sand
The Bak–Tang–Wiesenfeld sandpile model (ASM) is a deceptively simple cellular automaton that conceals a rich algebraic structure — an abelian group whose identity element is a non-trivial fractal-like configuration.
Raft & Viewstamped Replication: Consensus Visualized
An interactive deep-dive into two landmark distributed consensus algorithms — Raft and Viewstamped Replication — with live simulations you can explore and break.
From Regex to Automata to Generators
How regular expressions compile to deterministic finite automata — with an interactive visualizer, a DFA-powered string generator, and a note on closing the loop with property-based testing.