-
Notifications
You must be signed in to change notification settings - Fork 1
fix(hyperliquid): enforce 55s timeout for entire metric collection #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Wrap HTTP and WebSocket metric collection in asyncio.wait_for - Remove redundant inner timeout configurations in aiohttp - Reduce MAX_RETRIES from 3 to 2 and Retry-After from 15s to 3s - Ensure total execution cannot exceed 55s including retries/sleeps This prevents Hyperliquid metrics from exceeding the 59s Vercel limit by enforcing the timeout at the outer scope rather than per-request. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughMAX_RETRIES reduced from 3 to 2. HTTP and WebSocket metric collection switched to explicit asyncio.wait_for-based timeouts, removed session-level aiohttp timeout, adjusted HTTP retry/wait behavior (429 default 3s), added timing instrumentation, and improved cleanup and error handling. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Collector
participant H as HTTP Endpoint
Note over C: HTTP metric collection (updated flow)
C->>C: asyncio.wait_for(fetch_data, timeout)
alt Success within timeout
C->>H: POST request (no per-request timeout)
H-->>C: Response (status, body, trace timings)
C->>C: Record DNS/connect/total, compute RPC = total - connect
C->>C: Update metric, mark success
else HTTP 429
C->>C: Read Retry-After or default 3s
C->>C: Sleep and retry (up to MAX_RETRIES = 2)
else TimeoutError
C->>C: Mark failure with timeout message
else Non-200 / Missing response
C->>C: Raise/propagate error, mark failure
end
sequenceDiagram
autonumber
participant C as Collector
participant W as WebSocket Server
Note over C: WebSocket metric collection (updated flow)
C->>W: Connect / Subscribe
C->>C: asyncio.wait_for(_collect_ws_data, timeout)
alt Data received in time
W-->>C: Message
C->>C: Process latency, update metric, mark success
else TimeoutError
C->>C: Mark failure with timeout message
else Other exception
C->>C: Log/propagate error, mark failure
end
C->>W: Unsubscribe
C->>W: Close connection (errors logged if close fails)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
common/metric_types.py (1)
230-244: Robust Retry-After parsing; current int() cast can crash on HTTP-date values.Retry-After can be delta-seconds or an HTTP-date. int() will raise ValueError for dates, breaking the loop and the metric. Also keep the default 3s for safety.
Apply:
- if response.status == 429 and retry_count < MAX_RETRIES - 1: - wait_time = int(response.headers.get("Retry-After", 3)) + if response.status == 429 and retry_count < MAX_RETRIES - 1: + retry_after = response.headers.get("Retry-After") + wait_time: float = 3.0 + if retry_after: + try: + wait_time = float(retry_after) + except ValueError: + try: + from email.utils import parsedate_to_datetime + dt = parsedate_to_datetime(retry_after) + if dt is not None: + wait_time = max(dt.timestamp() - time.time(), 0.0) + except Exception: + wait_time = 3.0 await response.release() await asyncio.sleep(wait_time) continueAdditionally add the import once at file top if you choose not to import inline:
from email.utils import parsedate_to_datetime
🧹 Nitpick comments (8)
common/metric_types.py (8)
48-56: Avoid hard-coded WebSocket timeouts; tie them to config or a named constant.Hard-coding 10s here partially reintroduces per-operation limits. Prefer a named constant or derive from config (capped) for consistency and future tuning.
Apply either approach:
- websocket: websockets.WebSocketClientProtocol = await websockets.connect( - self.ws_endpoint, # type: ignore - ping_timeout=10, # self.config.timeout, - open_timeout=10, # self.config.timeout, - close_timeout=10, # self.config.timeout, - ) + websocket: websockets.WebSocketClientProtocol = await websockets.connect( + self.ws_endpoint, # type: ignore + ping_timeout=10.0, + open_timeout=min(self.config.timeout, 10.0), + close_timeout=10.0, + )
62-71: Annotate _collect_ws_data return type and remove trailing whitespace.Adds clarity and satisfies static analysis (ANN202, W293).
Apply:
- async def _collect_ws_data(): + async def _collect_ws_data() -> Any: nonlocal websocket websocket = await self.connect() await self.subscribe(websocket) data = await self.listen_for_data(websocket) - + if data is not None: return data raise ValueError("No data in response")
81-84: Preserve original TimeoutError and fix long line (E501).Keep the original exception/stack and provide a concise message.
Apply:
- except asyncio.TimeoutError: - self.mark_failure() - self.handle_error(TimeoutError(f"WebSocket metric collection exceeded {self.config.timeout}s timeout")) + except asyncio.TimeoutError as e: + self.mark_failure() + msg = f"WebSocket metric collection exceeded {self.config.timeout}s timeout" + e.args = (msg,) + self.handle_error(e)
121-124: Mirror TimeoutError handling for HTTP and fix long line (E501).Same rationale as WS path—preserve stack, shorten line.
Apply:
- except asyncio.TimeoutError: - self.mark_failure() - self.handle_error(TimeoutError(f"Metric collection exceeded {self.config.timeout}s timeout")) + except asyncio.TimeoutError as e: + self.mark_failure() + msg = f"Metric collection exceeded {self.config.timeout}s timeout" + e.args = (msg,) + self.handle_error(e)
193-201: Use get_endpoint() to ensure str typing and a single source of truth.You already defined get_endpoint() that normalizes to str. Using it here avoids None/typing pitfalls and duplication.
Apply:
- endpoint: str | None = self.config.endpoints.get_endpoint() + endpoint: str = self.get_endpoint()
261-263: Use logging instead of print for timing breakdown.Printing in libraries is noisy in serverless logs and not level-controlled. Prefer logging.debug/info.
Apply:
- print( - f"[{provider}] {method} timing: DNS={dns_time:.0f}ms, Connect={conn_time:.0f}ms, Total={total_time:.0f}ms, Endpoint={endpoint}" - ) + logging.info( + "[%s] %s timing: DNS=%dms, Connect=%dms, Total=%dms, Endpoint=%s", + provider, method, int(dns_time), int(conn_time), int(total_time), endpoint + )
283-283: Guard against negative RPC time (rare, but possible).Trace hooks and coarse timing can occasionally yield a slightly larger conn_time than measured response_time due to scheduling. Clamp at zero to avoid confusing negatives.
Apply:
- return response_time - (conn_time / 1000) + return max(response_time - (conn_time / 1000), 0.0)
245-263: Consider using timing["end"] - timing["start"] for total vs response_time.If you want DNS/connect breakdown to sum to “total,” base it on the trace callbacks rather than the coarse wall-clock around _send_request. Not required, just more precise.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
common/metric_types.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
common/metric_types.py (3)
metrics/ethereum.py (3)
subscribe(202-251)listen_for_data(274-291)process_data(293-310)common/base_metric.py (5)
process_data(53-54)update_metric_value(82-91)mark_success(93-95)mark_failure(97-102)handle_error(104-131)metrics/solana_landing_rate.py (2)
process_data(185-186)fetch_data(148-183)
🪛 Ruff (0.12.2)
common/metric_types.py
62-62: Missing return type annotation for private function _collect_ws_data
(ANN202)
67-67: Blank line contains whitespace
Remove whitespace from blank line
(W293)
83-83: Line too long (115 > 88)
(E501)
123-123: Line too long (105 > 88)
(E501)
🔇 Additional comments (4)
common/metric_types.py (4)
16-16: MAX_RETRIES=2 aligns with the outer 55s budget.Reducing retries helps guarantee total duration under the Vercel limit when combined with the outer wait_for.
72-80: Outer wait_for on full WS collection is correct.This matches the PR goal by bounding the entire subscription+listen sequence instead of individual calls.
111-119: Outer wait_for around fetch_data is the right abstraction.This centralizes time control and avoids per-request timeouts in aiohttp.
224-227: Removal of aiohttp session-level timeout matches the new outer control.Good call—this prevents per-request timeouts from fighting the global budget.
- Remove session-level timeout in blockchain_fetcher.py (line 50) - Remove leftover print statement in metric_types.py (line 261) - Add asyncio.shield for WebSocket cleanup to prevent cancellation noise These changes align with the centralized asyncio.wait_for timeout approach and clean up production logging. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
common/state/blockchain_fetcher.py (5)
33-36: Align retries with PR policy and remove unused timeout field
- self._timeout is no longer used; keeping it is confusing.
- PR sets MAX_RETRIES=2 and shorter waits elsewhere; mirror that here to keep global behavior consistent.
Apply:
- self._timeout = aiohttp.ClientTimeout(total=15) - self._max_retries = 3 - self._retry_delay = 5 + # Outer asyncio.wait_for enforces the hard cap; keep internal backoff modest + self._max_retries = 2 + self._retry_delay = 3If this class is intentionally allowed to retry more than metrics collection, add a brief comment explaining why it diverges.
126-131: Fix IndexError risk when transactions list is emptytransactions[0] is accessed before confirming the list is non-empty. This will raise IndexError when the latest block has no transactions.
Use a safe first-element extraction:
- transactions = latest_block.get("transactions", []) - tx_hash = ( - transactions[0].get("hash", "") - if isinstance(transactions[0], dict) - else transactions[0] if transactions else "" - ) + transactions = latest_block.get("transactions") or [] + if transactions: + first_tx = transactions[0] + tx_hash = first_tx.get("hash", "") if isinstance(first_tx, dict) else str(first_tx) + else: + tx_hash = ""
162-163: Guard against empty signatures array on Solana blocksIf "signatures" exists but is an empty list, latest_block.get("signatures", [""])[0] throws IndexError.
Apply:
- tx_sig = latest_block.get("signatures", [""])[0] + signatures = latest_block.get("signatures") or [""] + tx_sig = signatures[0] if signatures else ""
215-219: TON: handle empty transactions array safelySame empty-list hazard here. get("transactions", [{}])[0] will fail if transactions=[] is present.
Apply:
- tx_id = ( - block.get("transactions", [{}])[0].get("hash", "") - if isinstance(block, dict) - else "" - ) + tx_id = "" + if isinstance(block, dict): + txs = block.get("transactions") or [] + if txs: + first = txs[0] + if isinstance(first, dict): + tx_id = first.get("hash", "")
50-55: Ensure allfetch_latest_datacalls are wrapped withasyncio.wait_forThe removal of the per-session timeout in
common/state/blockchain_fetcher.pyrelies on every caller bounding the RPC with its ownasyncio.wait_for. However, inapi/support/update_state.py, the lone call tofetch_latest_datais not wrapped, which risks an unbounded hang if the upstream endpoint stalls.Please add an explicit timeout wrapper around this call. For example:
--- a/api/support/update_state.py +++ b/api/support/update_state.py @@ -102,7 +102,10 @@ def update_state(...): - fetcher = BlockchainDataFetcher(endpoint) - data: BlockchainData = await fetcher.fetch_latest_data(blockchain) + fetcher = BlockchainDataFetcher(endpoint) + # Bound the RPC to prevent indefinite hangs + data: BlockchainData = await asyncio.wait_for( + fetcher.fetch_latest_data(blockchain), + timeout=RPC_TIMEOUT_SECONDS + )– File:
api/support/update_state.py, around line 106
– Define or import a suitableRPC_TIMEOUT_SECONDSconstant consistent with project conventions
🧹 Nitpick comments (7)
common/state/blockchain_fetcher.py (1)
37-41: Avoid configuring root logging in a library moduleCalling logging.basicConfig() in library code can interfere with application-wide logging. Prefer configuring logging in the top-level entrypoint and use the module logger here.
Apply:
- logging.basicConfig( - level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s" - ) self._logger: logging.Logger = logging.getLogger(__name__)common/metric_types.py (6)
62-70: Type-annotate _collect_ws_data and remove stray whitespaceAddresses Ruff ANN202 and W293. Keeps style consistent with the rest.
- async def _collect_ws_data(): + async def _collect_ws_data() -> Any: nonlocal websocket websocket = await self.connect() await self.subscribe(websocket) data = await self.listen_for_data(websocket) - + if data is not None: return data raise ValueError("No data in response") try: data = await asyncio.wait_for( _collect_ws_data(), timeout=self.config.timeout ) latency: int | float = self.process_data(data) self.update_metric_value(latency) self.mark_success()Also applies to: 73-80
81-84: Wrap long line for TimeoutError message (E501)The message is fine; just wrap to satisfy linters and keep readability.
- except asyncio.TimeoutError: - self.mark_failure() - self.handle_error(TimeoutError(f"WebSocket metric collection exceeded {self.config.timeout}s timeout")) + except asyncio.TimeoutError: + self.mark_failure() + self.handle_error( + TimeoutError( + f"WebSocket metric collection exceeded {self.config.timeout}s timeout" + ) + )
112-115: HTTP path: wrap long TimeoutError message (E501) and keep structure consistentMirror the WS wrapping to keep both branches uniform and pass lint.
- data = await asyncio.wait_for( - self.fetch_data(), - timeout=self.config.timeout - ) + data = await asyncio.wait_for(self.fetch_data(), timeout=self.config.timeout) if data is not None: latency: int | float = self.process_data(data) self.update_metric_value(latency) self.mark_success() return raise ValueError("No data in response") - except asyncio.TimeoutError: - self.mark_failure() - self.handle_error(TimeoutError(f"Metric collection exceeded {self.config.timeout}s timeout")) + except asyncio.TimeoutError: + self.mark_failure() + self.handle_error( + TimeoutError( + f"Metric collection exceeded {self.config.timeout}s timeout" + ) + )Also applies to: 123-124
238-243: Robust Retry-After parsing to avoid ValueError on HTTP-date formatint(response.headers["Retry-After"]) will raise if the server sends an HTTP-date instead of delta-seconds. That would turn a transient 429 into an exception. Prefer a tolerant parser with a safe default.
- if response.status == 429 and retry_count < MAX_RETRIES - 1: - wait_time = int(response.headers.get("Retry-After", 3)) + if response.status == 429 and retry_count < MAX_RETRIES - 1: + # Retry-After may be delta-seconds or an HTTP-date; default to 3s on parse failure + retry_after_hdr = response.headers.get("Retry-After") + wait_time = 3 + if retry_after_hdr: + try: + wait_time = int(retry_after_hdr) + except ValueError: + # Leave wait_time at default; optionally parse HTTP-date if needed + pass await response.release() await asyncio.sleep(wait_time) continueIf you want full HTTP-date support, we can add email.utils.parsedate_to_datetime and compute seconds.
50-55: Consider making WS connect timeouts configurable or derived from configHardcoding 10s for ping/open/close is reasonable but opaque. Suggest tying them to config (e.g., min(self.config.timeout, 10)) or exposing constants to keep behavior consistent across metrics.
- websocket: websockets.WebSocketClientProtocol = await websockets.connect( - self.ws_endpoint, # type: ignore - ping_timeout=10, # self.config.timeout, - open_timeout=10, # self.config.timeout, - close_timeout=10, # self.config.timeout, - ) + ws_timeout = min(self.config.timeout, 10) + websocket: websockets.WebSocketClientProtocol = await websockets.connect( + self.ws_endpoint, # type: ignore + ping_timeout=ws_timeout, + open_timeout=ws_timeout, + close_timeout=ws_timeout, + )
194-201: It looks like thefetch_datamethod concludes without returningtotal_time, meaning it implicitly returnsNone, contradicting its declared-> floatsignature. Combined withendpointpossibly beingNone, this could lead to confusing runtime errors. Also, the timing breakdown variables (dns_time,conn_time, andtotal_time) are computed but never used or returned.Please address the following:
• Guard against a
Noneendpoint before use
- In
common/metric_types.pyatasync def fetch_data(starting line 194), add a check:endpoint = self.config.endpoints.get_endpoint() if not endpoint: raise ValueError("No HTTP endpoint configured for metrics")- Repeat in any other methods where
get_endpoint()is assigned to astr | Nonevariable (around lines 225–228 and 279–282).• Ensure
fetch_datareturns the computed latency
- After computing
total_time, addreturn total_timeso the method fulfills its-> floatcontract.• Remove or utilize unused timing variables
- If detailed breakdown logging is unnecessary, drop the
dns_timeandconn_timecalculations to eliminate dead code.- Otherwise, integrate them into your logging framework (e.g.,
logger.debug(f"dns={dns_time}ms conn={conn_time}ms total={total_time}ms")).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
common/metric_types.py(5 hunks)common/state/blockchain_fetcher.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
common/metric_types.py (3)
metrics/ethereum.py (4)
subscribe(202-251)listen_for_data(274-291)process_data(293-310)unsubscribe(253-272)common/base_metric.py (5)
process_data(53-54)update_metric_value(82-91)mark_success(93-95)mark_failure(97-102)handle_error(104-131)metrics/solana_landing_rate.py (2)
process_data(185-186)fetch_data(148-183)
🪛 Ruff (0.12.2)
common/metric_types.py
62-62: Missing return type annotation for private function _collect_ws_data
(ANN202)
67-67: Blank line contains whitespace
Remove whitespace from blank line
(W293)
83-83: Line too long (115 > 88)
(E501)
124-124: Line too long (105 > 88)
(E501)
🔇 Additional comments (1)
common/metric_types.py (1)
16-16: MAX_RETRIES=2 is a good fit for the 55s hard capThis matches the PR objective to reduce total time under the Vercel limit.
This prevents Hyperliquid metrics from exceeding the 59s Vercel limit by enforcing the timeout at the outer scope rather than per-request.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores