Optimizing Asyncio for Traffic File Uploads

The overnight handoff of commercial and promotional scheduling data from traffic management systems to the ad delivery engine is a deterministic, time-bound process: hundreds of thousands of line items must be validated and transmitted before morning playout. This guide solves one exact operational task — building a memory-aware asyncio uploader that streams a multi-gigabyte Avion export to the Avstar scheduling API under a hard concurrency ceiling, an enforced rate limit, and a cryptographic audit trail. It is the throughput-critical step of Async Batch Processing for High-Volume Logs, itself a phase of the broader Avion & Avstar Ingestion Pipelines. Getting the concurrency model right is not a performance nicety: an unbounded uploader that drops payloads mid-run leaves the as-run log inconsistent with what actually cleared, which is precisely the kind of gap FCC record-keeping and revenue reconciliation cannot tolerate.

Naive implementations spawn a coroutine per record with no backpressure and fail three ways in production: they load the whole export into memory and raise MemoryError past 2–4 GB; they saturate TCP sockets and surface ClientOSError from an unbounded connector; and they burst past Avstar’s request budget, drawing 429 Too Many Requests that corrupt scheduling grids when partially applied. The remedy is disciplined streaming with a bounded worker pool, in-flight schema enforcement, and deterministic retry.

Prerequisites

Python 3.11+ — required for asyncio.TaskGroup semantics and time.monotonic() drift guarantees used by the rate limiter.
Pinned dependencies — aiohttp==3.9.5, aiofiles==23.2.1, pydantic==2.7.1. Pin exactly; aiohttp connector defaults and Pydantic validator signatures both shift across minor releases.
Avstar API access — a service bearer token scoped to ingest:write, plus the published requests-per-minute ceiling for your contract tier (see Avstar API Authentication and Rate Limits).
A normalized export — records already resolved to the canonical spot schema with a stabilized billing code normalization pass applied upstream.
Newline-delimited JSON — one record per line (.ndjson), so the file can be streamed without materializing the array.

Step-by-Step Implementation

The uploader follows an explicit lifecycle: await uploader.start() opens the aiohttp session and spawns a bounded worker pool, stream_file() drives records through validation into a bounded queue, and await uploader.close() drains and tears down. Binding the session, queue, and worker tasks to the running event loop (not the constructor) keeps every awaitable attached to the correct loop.

Figure — Uploader lifecycle from start() spawning the worker pool, through stream_file enqueuing records that rate-limited workers consume, to queue.join draining the backlog and close() tearing down workers and the session.

Step 1 — Structured audit logging and the manifest

Goal: emit machine-parseable audit lines in the traffic-ops timestamp | level | module | spot_id shape, and carry a running manifest that reconciles record counts and a file checksum.

python

import asyncio
import hashlib
import json
import logging
import time
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Dict, List, Optional

import aiofiles
import aiohttp
from aiohttp import TCPConnector
from pydantic import BaseModel, Field, ValidationError

# Audit trail: timestamp | level | module | spot_id-bearing message
logger = logging.getLogger("traffic_uploader")
logger.setLevel(logging.INFO)
_handler = logging.StreamHandler()
_handler.setFormatter(logging.Formatter("%(asctime)s | %(levelname)s | traffic_uploader | %(message)s"))
logger.addHandler(_handler)


@dataclass
class AuditManifest:
    total_records: int = 0        # every line read from the export
    valid_records: int = 0        # rows that passed schema validation
    rejected_records: int = 0     # schema/JSON failures, quarantined
    checksum: str = ""            # SHA-256 over the raw export bytes
    errors: List[Dict] = field(default_factory=list)

Expected log line: 2026-07-03 02:14:07,881 | INFO | traffic_uploader | file stream complete | spot_id=- records=412903

Step 2 — The Pydantic record contract

Goal: reject malformed rows at the boundary with an explicit audit entry rather than letting them poison a batch. The model mirrors the Pydantic traffic-data validators used earlier in the pipeline, aliased to the Avion export’s PascalCase headers.

python

class TrafficRecord(BaseModel):
    spot_id: str = Field(..., alias="SpotID")
    station: str = Field(..., alias="StationCode")
    airtime: str = Field(..., alias="AirDateTime")   # ISO-8601, timezone-aware
    duration_sec: int = Field(..., alias="Duration", gt=0)
    advertiser: str = Field(..., alias="ClientName")

Step 3 — A coroutine-safe token bucket

Goal: hold the whole worker pool to Avstar’s sliding-window budget. A single lock-guarded bucket refills continuously and makes a coroutine wait only when the pool has spent its allowance.

python

class TokenBucketLimiter:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate                 # tokens (requests) granted per second
        self.capacity = capacity         # burst ceiling
        self.tokens = float(capacity)
        self.last_refill = time.monotonic()
        self._lock = asyncio.Lock()

    async def acquire(self) -> None:
        async with self._lock:
            now = time.monotonic()
            self.tokens = min(self.capacity, self.tokens + (now - self.last_refill) * self.rate)
            self.last_refill = now
            if self.tokens < 1.0:
                # Not enough budget: sleep just long enough to earn one token.
                await asyncio.sleep((1.0 - self.tokens) / self.rate)
                self.tokens = 0.0
            else:
                self.tokens -= 1.0

Step 4 — Uploader construction and lifecycle start

Goal: cap concurrency at the connector and the worker-pool level, and defer all loop-bound objects to start().

python

class AvstarUploader:
    def __init__(self, api_url: str, token: str, max_concurrency: int = 20, rate_limit: float = 10.0):
        self.api_url = api_url
        self.headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
        self.max_concurrency = max_concurrency
        self.limiter = TokenBucketLimiter(rate=rate_limit, capacity=int(rate_limit * 2))
        self.manifest = AuditManifest()
        # Session, queue, and workers are bound to the running loop -> created in start().
        self.session: Optional[aiohttp.ClientSession] = None
        self._batch_queue: asyncio.Queue = asyncio.Queue(maxsize=50)
        self._workers: List[asyncio.Task] = []

    async def start(self) -> None:
        """Open the HTTP session and spawn the bounded pool of upload workers."""
        connector = TCPConnector(limit=self.max_concurrency, limit_per_host=self.max_concurrency)
        self.session = aiohttp.ClientSession(connector=connector, headers=self.headers)
        self._workers = [asyncio.create_task(self._upload_worker()) for _ in range(self.max_concurrency)]
        logger.info("uploader started | spot_id=- workers=%d", self.max_concurrency)

Expected log line: ... | INFO | traffic_uploader | uploader started | spot_id=- workers=20

Step 5 — Stream, validate, and enqueue with backpressure

Goal: read the export line-by-line so memory stays flat, checksum the raw bytes for lineage, and let the bounded queue apply backpressure — put() blocks the reader when workers fall behind, capping live TrafficRecord instances regardless of file size.

python

class AvstarUploader:  # ...continued
    async def _validate_and_queue(self, raw_record: dict) -> None:
        try:
            validated = TrafficRecord(**raw_record)
            await self._batch_queue.put(validated)      # blocks when queue is full -> backpressure
            self.manifest.valid_records += 1
        except ValidationError as e:
            self.manifest.rejected_records += 1
            self.manifest.errors.append({"record": raw_record, "error": str(e)})
            logger.warning("schema rejection | spot_id=%s", raw_record.get("SpotID", "?"))

    async def stream_file(self, file_path: Path) -> None:
        hasher = hashlib.sha256()
        async with aiofiles.open(file_path, "rb") as f:
            async for line in f:
                line_bytes = line.strip()
                if not line_bytes:
                    continue
                hasher.update(line_bytes)
                self.manifest.total_records += 1
                try:
                    await self._validate_and_queue(json.loads(line_bytes))
                except json.JSONDecodeError as e:
                    self.manifest.rejected_records += 1
                    self.manifest.errors.append({"raw_line": line_bytes.decode(errors="replace"), "error": str(e)})
                    logger.warning("json parse error | spot_id=? line=%d", self.manifest.total_records)
        self.manifest.checksum = hasher.hexdigest()
        logger.info("file stream complete | spot_id=- records=%d", self.manifest.total_records)
        await self._batch_queue.join()                  # wait until every enqueued record is done

Step 6 — The worker: rate-limited dispatch with deterministic retry

Goal: each worker acquires a token, posts one record, and classifies the response — requeue on 429 after honoring Retry-After, treat any other 4xx as a permanent failure needing traffic-desk review, and never let a single record kill the worker.

python

class AvstarUploader:  # ...continued
    async def _upload_worker(self) -> None:
        assert self.session is not None, "start() must run before workers"
        while True:
            record = await self._batch_queue.get()
            try:
                await self.limiter.acquire()
                async with self.session.post(f"{self.api_url}/ingest", json=record.model_dump()) as resp:
                    if resp.status == 429:
                        retry_after = int(resp.headers.get("Retry-After", 5))
                        logger.warning("rate limited | spot_id=%s backoff=%ds", record.spot_id, retry_after)
                        await asyncio.sleep(retry_after)
                        await self._batch_queue.put(record)          # deterministic requeue
                    elif resp.status >= 400:
                        self.manifest.errors.append({"spot_id": record.spot_id, "status": resp.status})
                        logger.error("client error | spot_id=%s status=%d", record.spot_id, resp.status)
                    else:
                        logger.debug("uploaded | spot_id=%s", record.spot_id)
            except Exception as e:
                logger.error("upload failed | spot_id=%s error=%s", record.spot_id, e)
            finally:
                self._batch_queue.task_done()

Step 7 — Drain, close, and emit the manifest

Goal: cancel idle workers only after the queue has drained, release the session, and serialize the manifest for archival next to the processed file.

python

class AvstarUploader:  # ...continued
    async def close(self) -> None:
        for worker in self._workers:
            worker.cancel()
        await asyncio.gather(*self._workers, return_exceptions=True)
        if self.session is not None:
            await self.session.close()
        logger.info("uploader session closed | spot_id=- valid=%d rejected=%d",
                    self.manifest.valid_records, self.manifest.rejected_records)

    def generate_manifest(self) -> str:
        return json.dumps(asdict(self.manifest), indent=2)

Verification & Testing

Confirm three invariants before trusting a run: memory stays flat, counts reconcile, and the checksum is stable. Drive the uploader against an aiohttp test server (or a mock) with a small fixture and assert on the manifest.

python

import pytest

FIXTURE = [
    {"SpotID": "AV-100291", "StationCode": "WXYZ", "AirDateTime": "2026-07-03T06:00:00-04:00",
     "Duration": 30, "ClientName": "ACME Motors"},
    {"SpotID": "AV-100292", "StationCode": "WXYZ", "AirDateTime": "2026-07-03T06:00:30-04:00",
     "Duration": 0,  "ClientName": "Bad Row Co"},   # Duration gt=0 -> rejected
]

@pytest.mark.asyncio
async def test_manifest_reconciles(tmp_path, mock_avstar_url):
    export = tmp_path / "day.ndjson"
    export.write_text("\n".join(json.dumps(r) for r in FIXTURE))

    up = AvstarUploader(api_url=mock_avstar_url, token="test", max_concurrency=4, rate_limit=50.0)
    await up.start()
    await up.stream_file(export)
    await up.close()

    m = up.manifest
    assert m.total_records == 2          # every line counted
    assert m.valid_records == 1          # the zero-duration row was rejected
    assert m.rejected_records == 1
    assert len(m.checksum) == 64         # deterministic SHA-256 hex over raw bytes

Re-running the same fixture must reproduce an identical checksum — that stability is what lets you prove to an auditor that two ingest attempts saw byte-identical input.

Edge Cases & Failure Handling

Connection pool exhaustion. An unbounded session leaks file descriptors and hits OS socket limits, surfacing ClientOSError: [Errno 104] Connection reset by peer mid-window. Bound the TCPConnector with limit and limit_per_host (Step 4), watch netstat -an | grep ESTABLISHED during peak, and if resets persist drop max_concurrency to 15 and enable TCP keepalives on the connector.

Sustained 429 throttling. The token bucket keeps steady-state traffic under budget, but a contract-tier change or a competing job can push Avstar into repeated 429s. Wrap the per-record backoff of Step 6 in a circuit breaker: after N consecutive 429s, pause the pool for 60 seconds and flush the in-flight queue to a dead-letter directory rather than hot-looping the requeue. Always read Retry-After before sleeping — this is the same session-pressure failure mode covered under Avstar API Authentication and Rate Limits.

Crash mid-run without duplicate placements. If the process dies with records in flight, a naive restart re-posts already-cleared spots and double-books breaks. Make every request idempotent: send X-Request-ID = sha256(spot_id + airtime + file_checksum) so Avstar rejects duplicates, and checkpoint the byte offset every 5,000 records to a .checkpoint file. On restart, read the checkpoint, seek() to the offset, and resume streaming. Route permanently failed rows (4xx, unrecoverable schema violations) to a dead-letter queue that a secondary job reconciles into a CSV for the traffic desk. Trap SIGTERM/SIGINT and await self._batch_queue.join() before closing so the manifest reflects accurate final counts.

FAQ

Why one request per record instead of a single bulk POST?

Per-record dispatch isolates failure: a malformed or throttled spot is requeued or dead-lettered without rolling back the entire submission, which keeps the audit counts exact. Where Avstar exposes a true bulk endpoint you can assemble bounded batches upstream — see the batch-envelope contract in Async Batch Processing for High-Volume Logs — but each batch still needs its own idempotency key and a partial-accept reconciliation step.

How do I size max_concurrency and rate_limit for my station group?

Start from the published requests-per-minute ceiling on your contract tier, set rate_limit to roughly 80% of it in requests/second, and set max_concurrency no higher than the point where added workers stop improving throughput (typically 15–20 against a single host). The TCPConnector.limit and the worker count must match, or you will queue sockets you cannot open.

The uploader rejects rows that looked fine in the export. Where do I look?

Rejections are schema failures, not upload failures — inspect manifest.errors, where each entry carries the raw record and the Pydantic error path. The usual causes are a non-standard duration or an unnormalized billing code. Fix these upstream with the Pydantic traffic-data validators and a billing code normalization pass so the uploader only ever sees canonical rows.

Is streaming enough to prevent out-of-memory crashes?

Streaming caps the reader, but the bounded asyncio.Queue(maxsize=...) is what actually caps live objects: validated records cannot accumulate faster than workers drain them. In containers, also set PYTHONMALLOC=malloc to reduce small-object fragmentation, and call gc.collect() after every 10,000 records only if heap growth exceeds ~15% above baseline.

Async Batch Processing for High-Volume Logs — the producer-consumer batch model and dead-letter routing this uploader plugs into.
Handling Avstar Session Timeouts in Python — token rotation and retry-with-jitter for the session layer beneath these uploads.
Validating Spot Durations Against Broadcast Standards — the duration checks that keep zero-length and non-standard rows out of the queue.

Optimizing Asyncio for Traffic File Uploads

Prerequisites #

Step-by-Step Implementation #

Step 1 — Structured audit logging and the manifest #

Step 2 — The Pydantic record contract #

Step 3 — A coroutine-safe token bucket #

Step 4 — Uploader construction and lifecycle start #

Step 5 — Stream, validate, and enqueue with backpressure #

Step 6 — The worker: rate-limited dispatch with deterministic retry #

Step 7 — Drain, close, and emit the manifest #

Verification & Testing #

Edge Cases & Failure Handling #

FAQ #

Related #

Related content

Prerequisites

Step-by-Step Implementation

Step 1 — Structured audit logging and the manifest

Step 2 — The Pydantic record contract

Step 3 — A coroutine-safe token bucket

Step 4 — Uploader construction and lifecycle start

Step 5 — Stream, validate, and enqueue with backpressure

Step 6 — The worker: rate-limited dispatch with deterministic retry

Step 7 — Drain, close, and emit the manifest

Verification & Testing

Edge Cases & Failure Handling

FAQ

Related