Understanding Broadcast Spot Schemas and Metadata

A broadcast spot schema is the atomic data contract that every other system in a traffic operation depends on: sell it wrong once and the error propagates into scheduling, playout, and billing before anyone notices. The engineering problem this page solves is turning fragmented commercial inputs — CRM exports, agency manifests, direct-sales orders — into a single normalized, machine-readable record with immutable identifiers, explicit timezone handling, and typed constraints strict enough that downstream automation can trust it without re-validating. Get the field-level definitions right here and everything above the spot is deterministic; get them wrong and pipelines suffer silent data corruption, misrouted creative, and reconciliation failures. This guide sits under Broadcast Traffic Architecture & Taxonomy and specifies that contract end to end — framed in plain language for traffic managers who own the data quality, and delivered as deployable, typed code for the Python developers who enforce it.

Concept & Data Model

A spot is the smallest schedulable unit in the taxonomy: spot → avail → order → campaign. The schema is a formalized contract between sales, traffic, ad operations, and engineering, and it must encapsulate three otherwise-independent concerns in one payload — commercial intent (who bought what, and how it bills), technical constraints (duration, creative asset, playout tolerances), and compliance routing (clearance flags, political-file markers, competitive separation). Production-grade implementations enforce strict typing, immutable primary keys, and timezone-aware intervals so that a record means exactly the same thing in every system that reads it.

The canonical fields below are the ones the ingestion boundary guarantees before a record is allowed downstream. Currency is stored as integer cents to avoid float drift, and every datetime is stored timezone-aware in UTC so daylight-saving transitions never corrupt an interval boundary during conflict detection.

Field	Type	Constraint	Purpose
`spot_id`	`str` (UUIDv5)	immutable, namespace-bound, unique	Canonical primary key across scheduling, playout, and billing
`client_code`	`str`	controlled vocabulary	Advertiser identity for revenue attribution
`campaign_id`	`str`	FK → campaign	Reporting rollup and flight grouping
`order_id`	`str`	FK → order	Binds the spot to its contractual commitment
`product_code`	`str`	FK → rate card	Maps to rate-card tier and clearance matrix
`length_sec`	`int`	enum 10 / 15 / 30 / 60 / 120	Runtime validated against playout tolerances
`daypart_window`	`tuple[datetime, datetime]`	timezone-aware (UTC)	ISO 8601 start/end with explicit offset
`avail_type`	`enum`	preemptible / non_preemptible / bonus / makegood	Clearance tier and displacement eligibility
`clearance_flags`	`list[str]`	controlled vocabulary	Competitive blackout, political-file, sponsorship ID
`creative_ref`	`str`	version hash + delivery status	Asset pointer for the playout router
`rate_cents`	`int`	≥ 0	Integer currency; no float rounding drift
`priority_tier`	`int`	0–9	Conflict-resolution weight for the scheduler

This taxonomy dictates how spots resolve into inventory. The same avail_type and clearance_flags defined here are consumed one level up by Avails Mapping Strategies for Linear TV, which turns validated spots into monetizable placement windows, and the rate_cents / product_code join keys are the foundation for billing code normalization. Because these fields are read by so many subsystems, the schema behaves as a small state machine — a raw payload is validated, assigned a canonical identifier, and only then emitted as a trusted record; it is never mutated in place after emission.

Figure — Canonical spot-record handling: raw fields are validated and normalized, hashed into a deterministic identifier, then emitted as a canonical spot schema.

Implementation Approach

Two design decisions dominate a spot-schema layer: how identifiers are generated, and where validation is enforced.

Deterministic hashing over surrogate auto-increment keys. A spot arriving from three different upstream systems must resolve to one identifier, and the same order re-exported tomorrow must resolve to the same identifier — otherwise ingestion double-books inventory and fractures billing lineage. A namespace-bound UUIDv5 (or a SHA-256-derived key) computed from the stable business attributes gives idempotency for free: re-ingesting a record is a no-op rather than a duplicate. Auto-increment surrogate keys can’t offer this, because they encode arrival order rather than identity. The full legacy-to-canonical transformation, including collision handling and quarantine routing, is the subject of How to Map Legacy Spot IDs to Modern Schemas.

Validation at the boundary, not scattered through the code path. Type coercion, range checks, and cross-field dependency rules belong in one declarative model evaluated in a single transactional pass at ingestion — not in defensive if blocks sprinkled across the scheduler and billing engine. This is the same Pydantic validator discipline the ingestion pipelines use: a payload that cannot populate the model is dead-lettered with a structured reason, never partially accepted. Cross-field rules matter as much as per-field ones — a bonus avail cannot exceed a 15-second runtime, and a clearance_flags entry marking political inventory forces the presence of a resolvable product_code. Keeping the schema model aligned with the layered constraints in Spot Scheduling Validation & Rule Engines means a spot the schema layer marks valid is never rejected for a data reason at schedule time.

Event-driven emission over batch mutation. Validated records are emitted to downstream consumers as immutable events keyed on spot_id, rather than written back into a shared mutable table that the scheduler polls. Event emission preserves the audit trail and makes replay trivial; batch mutation invites the classic non-idempotency bug where a re-run silently regresses a record’s state.

Production Python Implementation

The reference module below is typed, deployable, and idempotent on spot_id. Pydantic v2 models enforce the schema contract at the boundary; cross-field validators encode broadcast-domain rules; and every transition emits a structured log line in the traffic-ops format timestamp | level | module | message so SOC 2 and FCC audit trails can be reconstructed from logs alone.

python

from __future__ import annotations

import hashlib
import logging
import uuid
from datetime import datetime, timezone
from enum import Enum

from pydantic import BaseModel, Field, model_validator

# Traffic-ops structured logging: timestamp | level | module | message
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
)
logger = logging.getLogger("traffic.schema.spot")

# RFC 4122 namespace so UUIDv5 generation is stable across environments.
SPOT_NAMESPACE = uuid.UUID("6ba7b810-9dad-11d1-80b4-00c04fd430c8")

# Runtime lengths a playout router can splice frame-accurately.
ALLOWED_LENGTHS: frozenset[int] = frozenset({10, 15, 30, 60, 120})


class AvailType(str, Enum):
    PREEMPTIBLE = "preemptible"
    NON_PREEMPTIBLE = "non_preemptible"
    BONUS = "bonus"
    MAKEGOOD = "makegood"


class SpotIngestPayload(BaseModel):
    """A raw spot as it arrives from a CRM, agency manifest, or sales order.

    Field types are enforced here; the model_validator enforces the cross-field
    broadcast rules that no single field can express on its own."""

    legacy_id: str = Field(..., min_length=1)          # source-system identifier
    client_code: str = Field(..., min_length=2)
    campaign_id: str = Field(..., min_length=1)
    order_id: str = Field(..., min_length=1)
    product_code: str = Field(..., min_length=1)
    length_sec: int = Field(..., gt=0)
    daypart_start: datetime                            # must be timezone-aware
    daypart_end: datetime
    avail_type: AvailType
    clearance_flags: list[str] = Field(default_factory=list)
    creative_ref: str | None = None
    rate_cents: int = Field(..., ge=0)                 # integer currency, no float drift
    priority_tier: int = Field(default=5, ge=0, le=9)

    @model_validator(mode="after")
    def enforce_broadcast_rules(self) -> "SpotIngestPayload":
        # Runtime must be a length the router can actually air.
        if self.length_sec not in ALLOWED_LENGTHS:
            raise ValueError(
                f"length_sec={self.length_sec} not in allowed set {sorted(ALLOWED_LENGTHS)}"
            )
        # Bonus inventory is capped: it may never displace a paid 30/60.
        if self.avail_type is AvailType.BONUS and self.length_sec > 15:
            raise ValueError("bonus spots may not exceed 15 seconds")
        # Every datetime must be timezone-aware so DST can never shift a boundary.
        if self.daypart_start.tzinfo is None or self.daypart_end.tzinfo is None:
            raise ValueError("daypart timestamps must be timezone-aware (UTC)")
        if self.daypart_end <= self.daypart_start:
            raise ValueError("daypart_end must be after daypart_start")
        # Political inventory is a compliance obligation, not a soft flag.
        if "political" in self.clearance_flags and not self.product_code:
            raise ValueError("political inventory requires a resolvable product_code")
        return self


class CanonicalSpot(BaseModel):
    """The trusted, immutable record every downstream system consumes."""

    spot_id: str                                       # UUIDv5 idempotency key
    client_code: str
    campaign_id: str
    order_id: str
    product_code: str
    length_sec: int
    daypart_start: str                                 # ISO 8601 with UTC offset
    daypart_end: str
    avail_type: AvailType
    clearance_flags: list[str]
    creative_ref: str | None
    rate_cents: int
    priority_tier: int
    schema_version: int
    normalized_at: str


class SpotNormalizer:
    """Validates raw payloads and emits canonical spots deterministically.

    map() is pure over its inputs: the same payload always yields the same
    spot_id, so re-ingesting a record is a no-op rather than a duplicate."""

    SCHEMA_VERSION = 3

    def _canonical_id(self, payload: SpotIngestPayload) -> str:
        # Identity is derived from stable business attributes, not arrival order.
        composite = ":".join(
            (payload.client_code, payload.campaign_id, payload.product_code, payload.legacy_id)
        )
        digest = hashlib.sha256(composite.encode("utf-8")).hexdigest()
        return str(uuid.uuid5(SPOT_NAMESPACE, digest))

    def map(self, payload: SpotIngestPayload) -> CanonicalSpot:
        spot_id = self._canonical_id(payload)
        spot = CanonicalSpot(
            spot_id=spot_id,
            client_code=payload.client_code,
            campaign_id=payload.campaign_id,
            order_id=payload.order_id,
            product_code=payload.product_code,
            length_sec=payload.length_sec,
            daypart_start=payload.daypart_start.astimezone(timezone.utc).isoformat(),
            daypart_end=payload.daypart_end.astimezone(timezone.utc).isoformat(),
            avail_type=payload.avail_type,
            clearance_flags=sorted(set(payload.clearance_flags)),
            creative_ref=payload.creative_ref,
            rate_cents=payload.rate_cents,
            priority_tier=payload.priority_tier,
            schema_version=self.SCHEMA_VERSION,
            normalized_at=datetime.now(timezone.utc).isoformat(),
        )
        logger.info(
            "spot_id=%s legacy_id=%s avail=%s len=%ds accepted=True",
            spot.spot_id, payload.legacy_id, spot.avail_type.value, spot.length_sec,
        )
        return spot

A representative accepted-path log line reads 2026-07-03T14:22:07+00:00 | INFO | traffic.schema.spot | spot_id=6ba7b811-… legacy_id=CRM-88213 avail=non_preemptible len=30s accepted=True. Because map is deterministic, replaying the same export during an audit produces byte-identical spot_id values — the property FCC public-inspection evidence and SOC 2 reproducibility both depend on.

Validation & Edge Cases

Broadcast operations generate boundary conditions a naïve normalizer mishandles. Each must be an explicit, tested case rather than a silent default:

Timezone offsets and split feeds. A daypart window is meaningless without an offset. All daypart_start / daypart_end values resolve to UTC; a payload carrying a naïve datetime is rejected at the boundary, because a fixed offset that is correct in January silently misaligns after a daylight-saving transition in July.
Runtime tolerances. length_sec is an enum, not a free integer. A 22-second creative is a hard rejection — the router cannot splice it into a 30-second break without clipping — and the same duration discipline is enforced downstream when validating spot durations against broadcast standards.
Preemption tiers. avail_type is not cosmetic. A non_preemptible spot carries a delivery guarantee, while a preemptible spot may be displaced subject to make-good obligations. Collapsing the two loses the contractual difference the scheduler needs to honor an SLA.
Competitive separation flags. A clearance_flags entry marking a product category is a placement constraint the scheduler must honor before two competing advertisers land in adjacent breaks. The schema carries the flag; enforcement happens at schedule time.
Zero-duration and bonus over-length. A length_sec of 0, or a bonus avail exceeding 15 seconds, is a hard rejection. Silent acceptance produces inventory the automation layer cannot air, which surfaces as a dropped spot on transmission.
Missing political disclosure. A record flagged political with an unresolvable product_code is a compliance breach, not a warning. It raises a validation exception and routes to a dead-letter queue rather than entering the schedule.

Integration Points

The spot schema is the entry contract for the whole pipeline: it consumes heterogeneous upstream payloads and produces one trusted record for scheduling and billing.

Upstream — ingestion. Raw spots arrive as flat CSV from legacy traffic systems, EDI from agency portals, or JSON from cloud order systems. Normalization and strict typing happen before the scheduler ever runs, using the format handling in the Avion & Avstar ingestion pipelines and the Pydantic validator pattern shown above. The SpotIngestPayload model is the contract: if a payload cannot populate it, the record is dead-lettered rather than normalized.

Downstream — scheduling and billing. The normalizer emits CanonicalSpot records over a versioned dispatch contract carrying an explicit schema_version, so a consumer written against version 2 can detect and route a version 3 payload rather than misreading it. The message schema below is what the scheduler and billing systems consume:

json

{
  "spot_id": "6ba7b811-9dad-11d1-80b4-00c04fd430c8",
  "campaign_id": "CMP-2231",
  "order_id": "ORD-88213",
  "product_code": "RC-PRIME-30",
  "length_sec": 30,
  "daypart_start": "2026-07-03T18:00:00+00:00",
  "daypart_end": "2026-07-03T23:00:00+00:00",
  "avail_type": "non_preemptible",
  "clearance_flags": ["cat-auto", "sponsor-0912"],
  "rate_cents": 145000,
  "schema_version": 3
}

Billing joins on product_code and rate_cents to attribute revenue, which is why standardizing those billing codes is a hard upstream precondition rather than a downstream reconciliation step. Access to both the ingestion and dispatch contracts is governed by the model in Security Boundaries for Traffic Database Access.

Compliance & Audit Considerations

The spot schema is where several regulatory and financial controls are first enforced, and getting them wrong is expensive.

FCC political file. A spot flagged as political inventory inherits lowest-unit-charge and public-inspection-file obligations from the moment it is normalized. The schema refuses to emit a political spot whose product_code cannot be resolved to a candidate, sponsor, and rate, because a missing disclosure is a violation rather than a soft warning.

Immutable, versioned records. Canonical spots are never updated in place. A correction is a new record with an incremented schema_version and a fresh normalized_at, chained to the original by its deterministic spot_id. Schema evolution follows the same discipline: field deprecation, nullable transitions, and versioned payload routing let engineering roll out changes without disrupting active scheduling queues or orphaning records — a live traffic database cannot tolerate downtime during a metadata migration.

SOC 2 reproducibility. Because map is deterministic and idempotent on spot_id, evidence collection can replay any historical export and obtain byte-identical records. That reproducibility is itself the control: it demonstrates that data was normalized by rule, not by manual override.

Troubleshooting & Common Errors

When a normalization run misbehaves, these are the named patterns operators hit most often, with root cause and remediation.

Error pattern	Diagnostic indicator	Root cause	Remediation
Duplicate placement	Two records share business attributes but differ in `spot_id`	Identifier derived from arrival order, not stable attributes	Regenerate IDs via UUIDv5 over `client_code:campaign_id:product_code:legacy_id`; dedupe on the canonical key
Naïve-datetime drift	Daypart boundary shifts by an hour after a DST change	`daypart_window` stored without a timezone offset	Reject naïve datetimes at the boundary; store timezone-aware UTC and resolve offsets before comparison
Runtime rejection at air	Spot clipped or dropped on transmission	`length_sec` accepted outside the allowed enum	Enforce the `ALLOWED_LENGTHS` set at ingestion; dead-letter non-conforming durations
Silent political non-disclosure	Political spot airs without a public-file entry	`clearance_flags` marked political but `product_code` unresolved	Fail closed in the cross-field validator; route to a dead-letter queue for manual disclosure
Schema-version misread	Consumer parses new fields as null	Payload emitted without an explicit `schema_version`	Stamp every dispatch with `schema_version`; route unsupported versions through a compatibility shim

When ingestion encounters systemic anomalies — corrupted exports, widespread schema drift, or an upstream failure — a circuit breaker in the automation layer halts downstream writes once an error threshold is breached and requires operator acknowledgment before resuming, the same breaker discipline detailed in Spot Scheduling Validation & Rule Engines.

How to Map Legacy Spot IDs to Modern Schemas — the deterministic legacy-to-canonical mapping, with quarantine routing and circuit-breaker handling.
Avails Mapping Strategies for Linear TV — how validated spots become monetizable placement windows for the scheduler.
Standardizing Billing Codes Across Traffic Systems — the billing-code normalization the spot schema relies on as its revenue join key.
Schema Validation with Pydantic for Traffic Data — the boundary validator pattern that enforces this schema on incoming payloads.
Broadcast Traffic Architecture & Taxonomy — the parent architecture and entity model this spot contract anchors.

Understanding Broadcast Spot Schemas and Metadata

Concept & Data Model #

Implementation Approach #

Production Python Implementation #

Validation & Edge Cases #

Integration Points #

Compliance & Audit Considerations #

Troubleshooting & Common Errors #

Related #

Explore this section

Related content