Handling Avstar Session Timeouts in Python

This guide solves one exact operational task: keeping a long-running Python ingestion job alive across Avstar session expiry, so that a night’s worth of commercial traffic is submitted in full — never stranded half-written between your process and the automation platform. It sits inside the Avion & Avstar Ingestion Pipelines and directly extends Avstar API Authentication and Rate Limits: where that page establishes the token and rate-budget control plane, this one shows the Python session wrapper that survives token expiry, idle-connection drops, and 429 throttling mid-batch without losing a spot. Session timeouts matter for audit integrity because a dropped session produces a partially-submitted log — the exact condition that forces manual reconciliation the next morning and breaks billing lineage back to the broadcast spot schema.

Avstar’s timeout behavior is rarely arbitrary. It stems from three operational realities: idle-connection limits on the Avstar application server, rate-limiting triggers during bulk log retrieval, and unbounded memory consumption during synchronous export parsing. Without explicit session lifecycle management, scripts fail silently or raise unhandled ConnectionResetError, HTTP 401, and ReadTimeout exceptions. The wrapper below treats session expiration as a recoverable state rather than a fatal error, and rests on three design principles: stateless token rotation (credentials are never cached in long-lived objects; tokens are fetched on demand and rotated immediately on 401/403), bounded async concurrency (exports are chunked and processed under an asyncio.Semaphore so the event loop never starves), and deterministic retry with jitter (429/5xx responses trigger exponential backoff with randomized jitter to avoid a thundering herd on the middleware).

Prerequisites

Python 3.11+ (the code uses datetime.timezone.utc and modern asyncio semantics; 3.13 recommended)
httpx==0.27.* — async HTTP client with fine-grained timeout control
pydantic==2.9.* — strict payload validation at the boundary (see schema validation with Pydantic for traffic data)
Avstar OAuth 2.0 client_id and api_key provisioned with the traffic:read scope granted through role-based access for traffic APIs
Credentials exported as environment variables — never hard-coded — as AVSTAR_BASE_URL, AVSTAR_API_KEY, AVSTAR_CLIENT_ID
Outbound network access to the Avstar /oauth/token and traffic-log endpoints

Step-by-Step Implementation

The wrapper is built in five steps: structured audit logging, a validation model, the session object, a retry/re-auth request core, and a memory-safe streaming generator. The decision flow that ties them together — success, re-authenticate, back off, or dead-letter — is shown in the state machine after the code.

Step 1 — Configure structured audit logging

Goal: emit one machine-parseable record per lifecycle event carrying correlation_id, session_state, retry_count, and the spot_id under handling, so broadcast reconciliation can trace every submission. The formatter mirrors the traffic-ops log convention (timestamp | level | module | spot_id) in structured JSON.

python

import asyncio
import hashlib
import json
import logging
import random
import time
from datetime import datetime, timezone
from typing import AsyncIterator, Dict, List, Optional

import httpx
from pydantic import BaseModel, Field, ValidationError, field_validator


class AuditFormatter(logging.Formatter):
    """Emit traffic-ops audit records: timestamp | level | module | spot_id + state."""

    def format(self, record: logging.LogRecord) -> str:
        log_obj: Dict[str, object] = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "level": record.levelname,
            "module": record.name,
            "spot_id": getattr(record, "spot_id", None),         # traceable to the spot schema
            "message": record.getMessage(),
            "correlation_id": getattr(record, "correlation_id", None),
            "session_state": getattr(record, "session_state", "unknown"),
            "retry_count": getattr(record, "retry_count", 0),
        }
        return json.dumps(log_obj)


audit_handler = logging.StreamHandler()
audit_handler.setFormatter(AuditFormatter())

logger = logging.getLogger("avstar_traffic_ingestion")
logger.setLevel(logging.INFO)
logger.addHandler(audit_handler)

Expected log line (session start):

text

{"timestamp": "2026-07-03T02:14:07.114+00:00", "level": "INFO", "module": "avstar_traffic_ingestion", "spot_id": null, "message": "AvstarTrafficSession initialized", "correlation_id": null, "session_state": "initialized", "retry_count": 0}

Step 2 — Define the validated traffic-spot model

Goal: reject malformed records at the boundary so a timeout never lets a corrupt spot slip through on retry. The TrafficSpot model enforces the canonical field constraints from the broadcast spot schema — stable spot_id, ISO date/time, and a bounded duration.

python

class TrafficSpot(BaseModel):
    spot_id: str = Field(..., min_length=3, max_length=20, description="Stable traffic-log primary key")
    run_date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$", description="YYYY-MM-DD air date")
    run_time: str = Field(..., pattern=r"^\d{2}:\d{2}:\d{2}$", description="HH:MM:SS air time")
    campaign_code: str = Field(..., max_length=50, description="Internal campaign identifier")
    duration_sec: int = Field(..., gt=0, le=3600, description="Spot duration in seconds")

    @field_validator("run_date", "run_time")
    @classmethod
    def validate_iso_format(cls, v: str) -> str:
        # Belt-and-braces: the regex guards shape, this guards real calendar/clock validity.
        try:
            datetime.strptime(v, "%H:%M:%S") if ":" in v else datetime.strptime(v, "%Y-%m-%d")
        except ValueError as e:
            raise ValueError(f"Invalid datetime format: {v}") from e
        return v

Step 3 — Build the session object with bounded timeouts

Goal: hold authentication state separately from data transport, cap concurrency, and give every socket phase an explicit timeout so an idle middleware drop surfaces as a catchable ReadTimeout rather than a hang.

python

class AvstarTrafficSession:
    def __init__(
        self,
        base_url: str,
        api_key: str,
        client_id: str,
        max_retries: int = 3,
        batch_size: int = 50,
        timeout: float = 30.0,
    ) -> None:
        self.base_url = base_url.rstrip("/")
        self.api_key = api_key
        self.client_id = client_id
        self.max_retries = max_retries
        self.batch_size = batch_size
        self._token: Optional[str] = None
        self._token_expires: float = 0.0
        self._semaphore = asyncio.Semaphore(4)  # conservative concurrency cap
        self.client = httpx.AsyncClient(
            timeout=httpx.Timeout(connect=5.0, read=timeout, write=10.0, pool=10.0),
            limits=httpx.Limits(max_connections=10, max_keepalive_connections=5),
        )
        logger.info("AvstarTrafficSession initialized", extra={"session_state": "initialized"})

Step 4 — Fetch, validate, and rotate the token

Goal: acquire a bearer token, cache its expiry with a 20% safety margin so refresh happens before hard expiry, and treat any 401 as a signal to rotate rather than to fail.

python

class AvstarTrafficSession:  # ...continued from Step 3
    async def _authenticate(self) -> str:
        """Fetch a session token and cache its expiry with a safety margin."""
        payload = {"grant_type": "client_credentials", "client_id": self.client_id, "api_key": self.api_key}
        resp = await self.client.post(f"{self.base_url}/oauth/token", json=payload)
        resp.raise_for_status()
        data = resp.json()
        self._token = data["access_token"]
        # Refresh at 80% of lifetime so a long batch never crosses hard expiry mid-flight.
        self._token_expires = time.time() + (data.get("expires_in", 3600) * 0.8)
        logger.info("Session token acquired", extra={"session_state": "authenticated"})
        return self._token

    def _is_token_valid(self) -> bool:
        return self._token is not None and time.time() < self._token_expires

Step 5 — Wrap requests in retry, re-auth, and backoff

Goal: make every request self-healing. Refresh before the call if the token is stale, rotate on 401, back off with jitter on 429 and 5xx, and dead-letter once max_retries is exhausted — the exact decision flow visualised below.

Figure — Retry/backoff state machine: a request succeeds on 200, re-authenticates on 401, backs off with jitter on 429/5xx, and dead-letters once retries are exhausted.

python

class AvstarTrafficSession:  # ...continued from Step 4
    async def _request_with_retry(self, method: str, url: str, **kwargs) -> httpx.Response:
        """Exponential backoff with jitter, auto re-auth on 401, retry on 5xx."""
        for attempt in range(self.max_retries + 1):
            if not self._is_token_valid():
                await self._authenticate()

            headers = kwargs.pop("headers", {})
            headers["Authorization"] = f"Bearer {self._token}"
            # Correlation id ties every retry of one logical request together in the audit trail.
            headers["X-Request-Correlation-ID"] = hashlib.sha256(
                f"{time.time()}-{url}".encode()
            ).hexdigest()[:16]
            kwargs["headers"] = headers

            try:
                resp = await self.client.request(method, url, **kwargs)

                if resp.status_code == 401:
                    logger.warning("Token expired mid-stream. Rotating.", extra={"retry_count": attempt})
                    await self._authenticate()
                    continue
                if resp.status_code == 429:
                    wait = min(60, 2 ** attempt + random.uniform(0, 1))
                    logger.warning(f"Rate limited. Backing off {wait:.2f}s", extra={"retry_count": attempt})
                    await asyncio.sleep(wait)
                    continue
                if resp.status_code >= 500:
                    wait = min(30, 2 ** attempt + random.uniform(0, 0.5))
                    logger.error(f"Server error {resp.status_code}. Retry in {wait:.2f}s", extra={"retry_count": attempt})
                    await asyncio.sleep(wait)
                    continue

                resp.raise_for_status()
                return resp

            except httpx.RequestError as e:
                # ReadTimeout / ConnectionResetError land here: an idle middleware drop.
                logger.error(f"Network failure: {e}", extra={"retry_count": attempt})
                if attempt == self.max_retries:
                    raise
                await asyncio.sleep(2 ** attempt + random.uniform(0, 1))

        raise RuntimeError("Max retries exceeded for Avstar session request")

Step 6 — Stream validated batches and close cleanly

Goal: page through the export as an async generator so memory stays flat regardless of log size, validate each record against TrafficSpot, and always release sockets on exit. This is the streaming counterpart to async batch processing for high-volume logs.

python

class AvstarTrafficSession:  # ...continued from Step 5
    async def fetch_traffic_spots(
        self, endpoint: str, params: Optional[Dict] = None
    ) -> AsyncIterator[List[TrafficSpot]]:
        """Memory-safe async generator yielding validated traffic batches."""
        page = 1
        while True:
            async with self._semaphore:
                query = {"page": page, "limit": self.batch_size, **(params or {})}
                resp = await self._request_with_retry("GET", f"{self.base_url}/{endpoint}", params=query)
                payload = resp.json()

            if not payload.get("data"):
                break

            validated_batch: List[TrafficSpot] = []
            for item in payload["data"]:
                try:
                    validated_batch.append(TrafficSpot(**item))
                except ValidationError as e:
                    logger.error(
                        f"Schema validation failed: {e}",
                        extra={"session_state": "validation_error", "spot_id": item.get("spot_id")},
                    )
                    continue  # rejected spot is routed to reconciliation, not dropped silently

            if validated_batch:
                logger.info(
                    f"Yielding batch of {len(validated_batch)} validated spots",
                    extra={"session_state": "streaming"},
                )
                yield validated_batch

            if not payload.get("has_next", False):
                break
            page += 1

    async def close(self) -> None:
        await self.client.aclose()
        logger.info("Session closed gracefully", extra={"session_state": "terminated"})

Production deployments externalize configuration and always run the session inside contextlib.aclosing(session) (or an explicit try/finally) so await session.close() executes and no sockets leak on process termination. The environment variables below tune session behavior:

Variable	Default	Purpose
`AVSTAR_BASE_URL`	`https://api.avstar.traffic`	Middleware endpoint
`AVSTAR_API_KEY`	(required)	Client credential
`AVSTAR_CLIENT_ID`	(required)	Application identifier
`AVSTAR_TIMEOUT`	`30.0`	Read timeout in seconds
`AVSTAR_MAX_RETRIES`	`3`	Retry ceiling per request
`AVSTAR_BATCH_SIZE`	`50`	Chunk size for async streaming

Verification & Testing

Confirm timeout recovery deterministically before pointing the client at production. Use a fixture that forces one 401 and one 429 in sequence, then assert the generator still yields every valid spot.

python

import pytest


@pytest.mark.asyncio
async def test_session_survives_401_then_429(respx_mock) -> None:
    base = "https://api.avstar.traffic"
    # Fixture: two token grants (initial + post-401 rotation)
    respx_mock.post(f"{base}/oauth/token").respond(
        json={"access_token": "tok", "expires_in": 3600}
    )
    route = respx_mock.get(f"{base}/traffic/logs")
    route.side_effect = [
        httpx.Response(401),  # token invalidated mid-stream -> triggers rotation
        httpx.Response(429),  # rate limited -> triggers jittered backoff
        httpx.Response(200, json={
            "data": [{
                "spot_id": "SPT-004417", "run_date": "2026-07-03",
                "run_time": "06:30:00", "campaign_code": "AUTO-Q3", "duration_sec": 30,
            }],
            "has_next": False,
        }),
    ]

    session = AvstarTrafficSession(base, "key", "cid", batch_size=1)
    collected = [b async for b in session.fetch_traffic_spots("traffic/logs")]
    await session.close()

    assert len(collected) == 1
    assert collected[0][0].spot_id == "SPT-004417"   # canonical spot survives recovery
    assert collected[0][0].duration_sec == 30

A passing run emits an INFO … "session_state": "streaming" record for the surviving batch and a WARNING … "Rate limited" record for the 429, with matching correlation_id values across the retried request. The absence of any session_state": "validation_error" line confirms no record was corrupted during recovery.

Edge Cases & Failure Handling

1. 401 Unauthorized mid-stream. The token was invalidated by the middleware (forced logout, credential rotation) rather than natural expiry, so the 20% margin didn’t catch it. _request_with_retry handles this by rotating on the 401 and replaying the same request — but if the fresh grant also returns 401, the loop exhausts and raises. Remediation: verify AVSTAR_API_KEY/AVSTAR_CLIENT_ID against the provisioning portal and confirm the scope was granted via role-based access for traffic APIs; replay any dead-lettered batch after the credential is restored.

2. 429 storm during a bulk export. The export outpaces Avstar’s request budget and every page bounces off the ceiling. Backoff alone stalls ingestion. Remediation: lower AVSTAR_BATCH_SIZE to 25 and the semaphore cap to 2, insert await asyncio.sleep(0.5) between yields, and honor any RateLimit-Remaining/Retry-After header the gateway exposes — the same budget discipline detailed in Avstar API Authentication and Rate Limits.

3. Repeated ReadTimeout on a legacy export endpoint. Large synchronous payloads block the middleware past the 30s read window, or idle keep-alive connections are dropped. The httpx.RequestError branch retries with jitter, but retries against a genuinely slow endpoint just waste attempts. Remediation: raise AVSTAR_TIMEOUT to 60.0 for legacy endpoints, align httpx.Limits with the middleware pool size, and enable OS-level TCP keep-alive (net.ipv4.tcp_keepalive_time = 60). Records that still fail after max_retries should be forwarded to a dead-letter queue and replayed — the same quarantine-then-reprocess pattern that feeds make-good routing for preemptions.

FAQ

Why refresh the token at 80% of its lifetime instead of waiting for a 401?

Reactive refresh means at least one request fails, gets caught, rotates the token, and replays — extra latency and an error line in the audit trail for every expiry. Refreshing at 80% of expires_in moves the rotation ahead of the failure so steady-state batches never see a 401 at all. The 401 handler in _request_with_retry stays as a safety net for forced invalidations, which the margin cannot predict. This proactive model is defined in Avstar API Authentication and Rate Limits.

A batch yielded fewer spots than the API returned — where did they go?

They failed TrafficSpot validation and were skipped with a session_state": "validation_error" log line carrying the offending spot_id. This is intentional: a timeout-recovered request must never let a malformed record through. Route those rejects to a reconciliation CSV and check them against the broadcast spot schema — the usual causes are out-of-range durations, non-ISO dates, or a missing spot_id. The broader validation strategy lives in schema validation with Pydantic for traffic data.

Will a retry after a network drop double-submit a spot?

Not for reads. For writes, the risk is real if the server accepted a submission just before the connection reset. Guard against it with an idempotency key derived from the stable spot_id so Avstar deduplicates a replayed submission instead of appending it. Never mint a new identifier on retry — the whole point of a stable spot ID is that the same logical spot is recognisable across every retry and every reconciliation pass.

How do I tune concurrency without triggering rate limits?

Start with the conservative asyncio.Semaphore(4) and batch_size=50, then raise concurrency only while watching RateLimit-Remaining. The moment 429 responses appear, you have found the ceiling — back off one step. High-throughput exports belong in async batch processing for high-volume logs, which pairs bounded concurrency with the same shared rate budget this client paces against.

Avstar API Authentication and Rate Limits — the token and rate-budget control plane this timeout recovery extends.
Async Batch Processing for High-Volume Logs — bounded-concurrency batch mechanics for exports too large to hold in memory.
Schema Validation with Pydantic for Traffic Data — the Pydantic validator gate that keeps a recovered request from submitting a corrupt spot.

Handling Avstar Session Timeouts in Python

Prerequisites #

Step-by-Step Implementation #

Step 1 — Configure structured audit logging #

Step 2 — Define the validated traffic-spot model #

Step 3 — Build the session object with bounded timeouts #

Step 4 — Fetch, validate, and rotate the token #

Step 5 — Wrap requests in retry, re-auth, and backoff #

Step 6 — Stream validated batches and close cleanly #

Verification & Testing #

Edge Cases & Failure Handling #

FAQ #

Related #

Related content

Prerequisites

Step-by-Step Implementation

Step 1 — Configure structured audit logging

Step 2 — Define the validated traffic-spot model

Step 3 — Build the session object with bounded timeouts

Step 4 — Fetch, validate, and rotate the token

Step 5 — Wrap requests in retry, re-auth, and backoff

Step 6 — Stream validated batches and close cleanly

Verification & Testing

Edge Cases & Failure Handling

FAQ

Related