Skip to content

fix(code): recover from GPU crash loops via software-rendering fallback#2642

Draft
posthog[bot] wants to merge 1 commit into
mainfrom
posthog-code/gpu-crash-loop-recovery
Draft

fix(code): recover from GPU crash loops via software-rendering fallback#2642
posthog[bot] wants to merge 1 commit into
mainfrom
posthog-code/gpu-crash-loop-recovery

Conversation

@posthog

@posthog posthog Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Problem

A handful of Windows desktop users hit repeated GPU process crashes that flooded Error Tracking and degraded their app experience. The new Child process gone (GPU): crashed issue logged 1,145 captured exceptions over 30 days, concentrated in just 4 users — a crash-loop pattern on specific machines with bad GPU drivers.

The child-process-gone handler in apps/code/src/main/index.ts only reported GPU/Utility deaths and never attempted recovery (unlike the render-process-gone handler), and the app never fell back to software rendering — so afflicted machines kept crash-looping.

Changes

  • Software-rendering fallback: on repeated GPU child-process crashes within a 60s window, persist a flag so the next launch boots with hardware acceleration disabled (app.disableHardwareAcceleration() + disable-gpu-compositing switch in bootstrap.ts). Electron can only toggle this before app-ready, so the fallback takes effect on restart — which is fine since Chromium auto-restarts a dead GPU process in-session.
  • Rate-limiting: cap child-process-gone captureException calls (5 per 60s window) so a single crash-looping host can't dominate error volume. Suppressed counts ride along on the next reported exception so volume stays visible.
  • New gpu-recovery util (electron-store backed) with unit tests.

How did you test this?

  • pnpm --filter code typecheck — clean
  • biome check / lint — clean on all changed files
  • vitest run src/main/utils/gpu-recovery.test.ts — 2 passing tests covering the default and persisted-fallback paths
  • node scripts/check-host-boundaries.mjs — no new violations

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Created with PostHog Code from an inbox report.

The child-process-gone handler only reported GPU/Utility crashes and never
recovered, so a few Windows machines with bad GPU drivers crash-looped and
flooded Error Tracking (1,145 GPU "crashed" captures from 4 users).

- Detect repeated GPU child-process crashes within a 60s window and persist a
  flag that makes the next launch boot with hardware acceleration disabled
  (app.disableHardwareAcceleration() + disable-gpu-compositing in bootstrap.ts),
  breaking the loop on afflicted machines.
- Rate-limit child-process-gone captureException calls so a single crash-looping
  host can't dominate error volume; suppressed counts ride along on the next
  reported exception.

Generated-By: PostHog Code
Task-Id: b768289b-c21c-429c-8dd7-7af17f4f9f1f
@github-actions

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit 9e655a5.

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/code/src/main/index.ts:247
The condition evaluates `isGpuCrashLoop()` before checking `!isHardwareAccelerationDisabled()`. Because `isGpuCrashLoop()` has a side-effect (it pushes the current timestamp into `recentGpuCrashTimestamps` every time it runs), once the fallback has been persisted — either in a prior session or earlier in the same session — every subsequent GPU crash still silently records a timestamp even though the flag can never be re-armed. Checking the cheap, side-effect-free guard first avoids the unnecessary state mutation and I/O.

```suggestion
  if (isGpuCrash && !isHardwareAccelerationDisabled() && isGpuCrashLoop()) {
```

### Issue 2 of 2
apps/code/src/main/utils/gpu-recovery.ts:30-32
**No path back to hardware rendering**

Once `disableHardwareAcceleration` is persisted as `true` it stays that way forever — there is no complementary reset function and nothing in bootstrap that clears the flag after a crash-free session. A user whose bad GPU driver is later updated (or who just had a one-off driver hiccup on an otherwise healthy machine) will be permanently stuck on software rendering with no in-app way to recover, and would need to manually hunt down and edit the `gpu-recovery` store file. Consider resetting the flag (or decrementing a counter) after the app completes a grace period without a GPU crash.

Reviews (1): Last reviewed commit: "fix(code): recover from GPU crash loops ..." | Re-trigger Greptile

// next launch boots with hardware acceleration disabled (see bootstrap.ts).
// Chromium usually auto-restarts a dead GPU process, so we let it recover
// in-session and only break the loop across restarts.
if (isGpuCrash && isGpuCrashLoop() && !isHardwareAccelerationDisabled()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The condition evaluates isGpuCrashLoop() before checking !isHardwareAccelerationDisabled(). Because isGpuCrashLoop() has a side-effect (it pushes the current timestamp into recentGpuCrashTimestamps every time it runs), once the fallback has been persisted — either in a prior session or earlier in the same session — every subsequent GPU crash still silently records a timestamp even though the flag can never be re-armed. Checking the cheap, side-effect-free guard first avoids the unnecessary state mutation and I/O.

Suggested change
if (isGpuCrash && isGpuCrashLoop() && !isHardwareAccelerationDisabled()) {
if (isGpuCrash && !isHardwareAccelerationDisabled() && isGpuCrashLoop()) {
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/index.ts
Line: 247

Comment:
The condition evaluates `isGpuCrashLoop()` before checking `!isHardwareAccelerationDisabled()`. Because `isGpuCrashLoop()` has a side-effect (it pushes the current timestamp into `recentGpuCrashTimestamps` every time it runs), once the fallback has been persisted — either in a prior session or earlier in the same session — every subsequent GPU crash still silently records a timestamp even though the flag can never be re-armed. Checking the cheap, side-effect-free guard first avoids the unnecessary state mutation and I/O.

```suggestion
  if (isGpuCrash && !isHardwareAccelerationDisabled() && isGpuCrashLoop()) {
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +30 to +32
export function persistDisableHardwareAcceleration(): void {
gpuRecoveryStore().set("disableHardwareAcceleration", true);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No path back to hardware rendering

Once disableHardwareAcceleration is persisted as true it stays that way forever — there is no complementary reset function and nothing in bootstrap that clears the flag after a crash-free session. A user whose bad GPU driver is later updated (or who just had a one-off driver hiccup on an otherwise healthy machine) will be permanently stuck on software rendering with no in-app way to recover, and would need to manually hunt down and edit the gpu-recovery store file. Consider resetting the flag (or decrementing a counter) after the app completes a grace period without a GPU crash.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/utils/gpu-recovery.ts
Line: 30-32

Comment:
**No path back to hardware rendering**

Once `disableHardwareAcceleration` is persisted as `true` it stays that way forever — there is no complementary reset function and nothing in bootstrap that clears the flag after a crash-free session. A user whose bad GPU driver is later updated (or who just had a one-off driver hiccup on an otherwise healthy machine) will be permanently stuck on software rendering with no in-app way to recover, and would need to manually hunt down and edit the `gpu-recovery` store file. Consider resetting the flag (or decrementing a counter) after the app completes a grace period without a GPU crash.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants