Skip to content

[Feat] [SDK-399] Okhttp interceptor#367

Open
buongarzoni wants to merge 4 commits intomasterfrom
feat/SDK-399/network-telemetry-interceptor
Open

[Feat] [SDK-399] Okhttp interceptor#367
buongarzoni wants to merge 4 commits intomasterfrom
feat/SDK-399/network-telemetry-interceptor

Conversation

@buongarzoni
Copy link
Copy Markdown
Collaborator

@buongarzoni buongarzoni commented Mar 24, 2026

Description of the change

  • Add rollbar-okhttp module with an OkHttp interceptor that records network telemetry (HTTP errors) and error events (connection failures/timeouts) via Rollbar
  • Includes NetworkTelemetryRecorder interface for bridging with any Rollbar instance

Examples

400/500
Captura de pantalla 2026-03-24 a la(s) 4 34 01 p  m

connection error logs the IOException
Captura de pantalla 2026-03-24 a la(s) 4 36 05 p  m

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Maintenance
  • New release

Checklists

Development

  • Lint rules pass locally
  • The code changed/added as part of this pull request has been covered with tests
  • All tests related to the changed code pass in development

Code review

  • This pull request has a descriptive title and information useful to a reviewer. There may be a screenshot or screencast attached
  • "Ready for review" label attached to the PR and reviewers assigned
  • Issue from task tracker has a link to this pull request
  • Changes have been reviewed by at least one other engineer

@buongarzoni buongarzoni added this to the 2.3.0 milestone Mar 24, 2026
@buongarzoni buongarzoni self-assigned this Mar 24, 2026
@linear
Copy link
Copy Markdown

linear bot commented Mar 24, 2026

@brianr
Copy link
Copy Markdown
Member

brianr commented Apr 6, 2026

@claude review

Comment on lines +23 to +41
try {
Response response = chain.proceed(request);

if (response.code() >= 400 && recorder != null) {
recorder.recordNetworkEvent(
Level.CRITICAL,
request.method(),
request.url().toString(),
String.valueOf(response.code()));
}

return response;

} catch (IOException e) {
if (recorder != null) {
recorder.recordErrorEvent(e);
}

throw e;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 If the user-supplied NetworkTelemetryRecorder throws an unchecked RuntimeException, two bugs occur in RollbarOkHttpInterceptor.intercept(): (1) in the success path, the Response obtained from chain.proceed() is never returned or closed, leaking OkHttp connections and eventually exhausting the connection pool; (2) in the error path, the original IOException is silently swallowed and the RuntimeException propagates instead, breaking the throws IOException contract and bypassing callers' retry/fallback logic. Both paths should wrap recorder calls in try-catch(Exception) to isolate recorder failures.

Extended reasoning...

Bug 1 — Response body leak (success path, lines 24–34)

In intercept(), chain.proceed(request) acquires a live OkHttp network connection and wraps it in a Response object. When response.code() >= 400 && recorder \!= null, the code calls recorder.recordNetworkEvent(). If that call throws an unchecked RuntimeException (e.g., a NullPointerException inside the user's Rollbar integration, or a serialization failure), the exception is not caught by the catch (IOException e) block. As a result, return response at line 34 is never reached: the caller never receives the Response and can never call response.close(). Per OkHttp's documentation, response bodies must be explicitly closed to return the underlying connection to the connection pool.

Step-by-step proof (Bug 1): Suppose the server returns a 500 and recorder.recordNetworkEvent() throws new RuntimeException("Rollbar unavailable"). (1) chain.proceed(request) returns a Response holding an open connection — (2) response.code() >= 400 is true, so recorder.recordNetworkEvent() is called — (3) the recorder throws RuntimeException — (4) the exception propagates past return response (never executed) and past the catch (IOException e) block (only catches checked exceptions) — (5) the caller receives a RuntimeException with no Response to close — (6) the connection is held indefinitely. Under sustained 4xx/5xx traffic with a broken Rollbar integration, every error response leaks one connection. With OkHttp's default pool of 5 connections, as few as 5 failures can freeze all subsequent HTTP calls.

Bug 2 — IOException masked (error path, lines 36–41)

In the catch (IOException e) block, recorder.recordErrorEvent(e) is called before throw e. If recordErrorEvent() throws a RuntimeException R, the throw e statement is never executed. R propagates to the caller instead of the original IOException. This breaks the method's throws IOException declaration: callers that wrap the call in catch (IOException) for retry or fallback logic will not catch R, the original network failure (timeout, connection refused, etc.) is lost, and async frameworks like Kotlin coroutines or RxJava that classify errors by type will misclassify the failure.

Step-by-step proof (Bug 2): Suppose a DNS failure causes chain.proceed() to throw new IOException("Unable to resolve host"). (1) The catch (IOException e) block is entered — (2) recorder.recordErrorEvent(e) is called, and the recorder internally calls rollbar.log(exception) which throws new NullPointerException() due to a missing payload field — (3) the NPE propagates out of the catch block; throw e is never reached — (4) the caller's catch (IOException) block is bypassed entirely — (5) the original "Unable to resolve host" error is invisible; the caller sees only an NPE with no network context.

Why existing code doesn't prevent it: The catch (IOException e) block only catches checked IOExceptions. Both RuntimeException thrown from recordNetworkEvent() in the try block and RuntimeException thrown from recordErrorEvent() in the catch block are unchecked and propagate freely through these constructs. There is no finally block and no suppression of recorder exceptions.

Fix: Wrap each recorder call in try-catch(Exception) (or at minimum try-catch(RuntimeException)) and log/suppress the recorder's exception. For Bug 1, also add a try-finally (or try-with-resources) around the response to ensure response.close() is called if a recorder exception escapes before return response. The recorder's failure should never affect the interceptor's core behavior.

}

group = "com.rollbar.okhttp"
version = "2.2.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 rollbar-okhttp/build.gradle.kts hardcodes version = "2.2.0" instead of relying on the root project's version management, so future release bumps to VERSION_NAME will not update this module's published version. Remove the local version declaration to let the root allprojects { version = versionName } take effect, keeping rollbar-okhttp in sync with the rest of the SDK.

Extended reasoning...

What the bug is and how it manifests

The new rollbar-okhttp module sets version = "2.2.0" directly in its build.gradle.kts (line 6). In Gradle, each subproject's build script is evaluated after the root project's allprojects block, which means a local version = ... assignment in a subproject overrides whatever the root block set. As a result, rollbar-okhttp will always advertise itself as version 2.2.0 in its published Maven artifact regardless of what VERSION_NAME is set to.

The specific code path that triggers it

The root build.gradle.kts reads val versionName = rootProject.extra["VERSION_NAME"] as String from gradle.properties and then applies it inside allprojects { version = versionName }. rollbar-okhttp's local version = "2.2.0" in its own build script then replaces that value during subproject configuration. All other submodules (rollbar-java, rollbar-api, rollbar-web, rollbar-reactive-streams, etc.) omit a local version declaration and therefore correctly inherit the root-level value.

Why existing code doesn't prevent it

There is no enforcement mechanism (e.g., a build check or version-consistency task) that asserts all published modules share the same version. The hardcoded string happens to equal the current VERSION_NAME (also "2.2.0"), so the problem is invisible today.

What the impact would be

When the project next bumps VERSION_NAME (say to 2.3.0), every other module will publish as 2.3.0, but rollbar-okhttp will still publish as 2.2.0. Consumers who declare a BOM or uniform version constraint for all rollbar artifacts will get a resolution conflict or silently use the wrong (stale) artifact.

How to fix it

Remove line 6 (version = "2.2.0") from rollbar-okhttp/build.gradle.kts. The root allprojects block will then set the version correctly for this module just as it does for every other module.

Step-by-step proof

  1. gradle.properties sets VERSION_NAME=2.2.0.
  2. Root build.gradle.kts evaluates allprojects { version = "2.2.0" } → rollbar-okhttp.version = "2.2.0" ✓
  3. Gradle then evaluates rollbar-okhttp/build.gradle.kts; the line version = "2.2.0" reassigns the same value (no visible effect right now).
  4. Developer bumps VERSION_NAME=2.3.0 in gradle.properties.
  5. Root allprojects sets rollbar-okhttp.version = "2.3.0" ✓ — but is immediately overwritten in step 6.
  6. rollbar-okhttp/build.gradle.kts still has version = "2.2.0"; Gradle overwrites to "2.2.0".
  7. ./gradlew :rollbar-okhttp:publish publishes com.rollbar:rollbar-okhttp:2.2.0 while all peer artifacts publish as 2.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants