fix(code): recover from GPU crash loops via software-rendering fallback#2642
fix(code): recover from GPU crash loops via software-rendering fallback#2642posthog[bot] wants to merge 1 commit into
Conversation
The child-process-gone handler only reported GPU/Utility crashes and never recovered, so a few Windows machines with bad GPU drivers crash-looped and flooded Error Tracking (1,145 GPU "crashed" captures from 4 users). - Detect repeated GPU child-process crashes within a 60s window and persist a flag that makes the next launch boot with hardware acceleration disabled (app.disableHardwareAcceleration() + disable-gpu-compositing in bootstrap.ts), breaking the loop on afflicted machines. - Rate-limit child-process-gone captureException calls so a single crash-looping host can't dominate error volume; suppressed counts ride along on the next reported exception. Generated-By: PostHog Code Task-Id: b768289b-c21c-429c-8dd7-7af17f4f9f1f
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
apps/code/src/main/index.ts:247
The condition evaluates `isGpuCrashLoop()` before checking `!isHardwareAccelerationDisabled()`. Because `isGpuCrashLoop()` has a side-effect (it pushes the current timestamp into `recentGpuCrashTimestamps` every time it runs), once the fallback has been persisted — either in a prior session or earlier in the same session — every subsequent GPU crash still silently records a timestamp even though the flag can never be re-armed. Checking the cheap, side-effect-free guard first avoids the unnecessary state mutation and I/O.
```suggestion
if (isGpuCrash && !isHardwareAccelerationDisabled() && isGpuCrashLoop()) {
```
### Issue 2 of 2
apps/code/src/main/utils/gpu-recovery.ts:30-32
**No path back to hardware rendering**
Once `disableHardwareAcceleration` is persisted as `true` it stays that way forever — there is no complementary reset function and nothing in bootstrap that clears the flag after a crash-free session. A user whose bad GPU driver is later updated (or who just had a one-off driver hiccup on an otherwise healthy machine) will be permanently stuck on software rendering with no in-app way to recover, and would need to manually hunt down and edit the `gpu-recovery` store file. Consider resetting the flag (or decrementing a counter) after the app completes a grace period without a GPU crash.
Reviews (1): Last reviewed commit: "fix(code): recover from GPU crash loops ..." | Re-trigger Greptile |
| // next launch boots with hardware acceleration disabled (see bootstrap.ts). | ||
| // Chromium usually auto-restarts a dead GPU process, so we let it recover | ||
| // in-session and only break the loop across restarts. | ||
| if (isGpuCrash && isGpuCrashLoop() && !isHardwareAccelerationDisabled()) { |
There was a problem hiding this comment.
The condition evaluates
isGpuCrashLoop() before checking !isHardwareAccelerationDisabled(). Because isGpuCrashLoop() has a side-effect (it pushes the current timestamp into recentGpuCrashTimestamps every time it runs), once the fallback has been persisted — either in a prior session or earlier in the same session — every subsequent GPU crash still silently records a timestamp even though the flag can never be re-armed. Checking the cheap, side-effect-free guard first avoids the unnecessary state mutation and I/O.
| if (isGpuCrash && isGpuCrashLoop() && !isHardwareAccelerationDisabled()) { | |
| if (isGpuCrash && !isHardwareAccelerationDisabled() && isGpuCrashLoop()) { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/index.ts
Line: 247
Comment:
The condition evaluates `isGpuCrashLoop()` before checking `!isHardwareAccelerationDisabled()`. Because `isGpuCrashLoop()` has a side-effect (it pushes the current timestamp into `recentGpuCrashTimestamps` every time it runs), once the fallback has been persisted — either in a prior session or earlier in the same session — every subsequent GPU crash still silently records a timestamp even though the flag can never be re-armed. Checking the cheap, side-effect-free guard first avoids the unnecessary state mutation and I/O.
```suggestion
if (isGpuCrash && !isHardwareAccelerationDisabled() && isGpuCrashLoop()) {
```
How can I resolve this? If you propose a fix, please make it concise.| export function persistDisableHardwareAcceleration(): void { | ||
| gpuRecoveryStore().set("disableHardwareAcceleration", true); | ||
| } |
There was a problem hiding this comment.
No path back to hardware rendering
Once disableHardwareAcceleration is persisted as true it stays that way forever — there is no complementary reset function and nothing in bootstrap that clears the flag after a crash-free session. A user whose bad GPU driver is later updated (or who just had a one-off driver hiccup on an otherwise healthy machine) will be permanently stuck on software rendering with no in-app way to recover, and would need to manually hunt down and edit the gpu-recovery store file. Consider resetting the flag (or decrementing a counter) after the app completes a grace period without a GPU crash.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/utils/gpu-recovery.ts
Line: 30-32
Comment:
**No path back to hardware rendering**
Once `disableHardwareAcceleration` is persisted as `true` it stays that way forever — there is no complementary reset function and nothing in bootstrap that clears the flag after a crash-free session. A user whose bad GPU driver is later updated (or who just had a one-off driver hiccup on an otherwise healthy machine) will be permanently stuck on software rendering with no in-app way to recover, and would need to manually hunt down and edit the `gpu-recovery` store file. Consider resetting the flag (or decrementing a counter) after the app completes a grace period without a GPU crash.
How can I resolve this? If you propose a fix, please make it concise.
Problem
A handful of Windows desktop users hit repeated GPU process crashes that flooded Error Tracking and degraded their app experience. The new
Child process gone (GPU): crashedissue logged 1,145 captured exceptions over 30 days, concentrated in just 4 users — a crash-loop pattern on specific machines with bad GPU drivers.The
child-process-gonehandler inapps/code/src/main/index.tsonly reported GPU/Utility deaths and never attempted recovery (unlike therender-process-gonehandler), and the app never fell back to software rendering — so afflicted machines kept crash-looping.Changes
app.disableHardwareAcceleration()+disable-gpu-compositingswitch inbootstrap.ts). Electron can only toggle this before app-ready, so the fallback takes effect on restart — which is fine since Chromium auto-restarts a dead GPU process in-session.child-process-gonecaptureExceptioncalls (5 per 60s window) so a single crash-looping host can't dominate error volume. Suppressed counts ride along on the next reported exception so volume stays visible.gpu-recoveryutil (electron-store backed) with unit tests.How did you test this?
pnpm --filter code typecheck— cleanbiome check/ lint — clean on all changed filesvitest run src/main/utils/gpu-recovery.test.ts— 2 passing tests covering the default and persisted-fallback pathsnode scripts/check-host-boundaries.mjs— no new violationsAutomatic notifications
Created with PostHog Code from an inbox report.