Concepts

Request Flow

How a request moves through the gateway

Every request enters the system through the gateway binary. The path it takes depends on whether it carries a payment signature and whether it targets the standard chat completions endpoint or the Agent-to-Agent (A2A) interface.

Chat Completions Flow

Endpoint: POST /v1/chat/completions

Parse and resolve the model

The gateway parses the request body and resolves the model field. This accepts short aliases (sonnet, gpt4o), routing profile names (auto, eco, premium, free), or full model IDs (claude-sonnet-4-20250514). Profile names are handed to the smart router for tier classification.

Prompt guard checks

The request content is scanned for prompt injection attempts, jailbreak patterns, and PII. Requests that fail these checks are rejected before any payment logic runs.

Check for PAYMENT-SIGNATURE header

If the PAYMENT-SIGNATURE header is absent, the gateway immediately returns HTTP 402 with a full cost breakdown (base cost + 5% platform fee) and a list of accepted payment schemes — both exact USDC-SPL transfer and escrow deposit formats are included.

Decode and validate the payment signature

When the header is present it is decoded (base64 or raw JSON), then checked against Redis for replay attacks. Duplicate signatures are rejected with a 402.

Verify via Facilitator

The decoded payment is sent to the Facilitator service for on-chain verification. The x402 crate handles this through the chain-agnostic PaymentVerifier trait.

Proxy to the LLM provider

The gateway translates the OpenAI-compatible request into the provider's native format, forwards it, and streams or buffers the response.

Post-response housekeeping

The response is cached in Redis (with a content-addressed key). Usage is logged to PostgreSQL and an escrow claim is fired if the payment used the escrow scheme. Both writes are tokio::spawn — they do not block the response.

Return response to caller

The gateway returns JSON or an SSE stream depending on whether the request set "stream": true.

Note

Steps 7 and 8 (cache write, DB log, escrow claim) are fire-and-forget. A slow database or Redis timeout cannot delay the response the client receives.

A2A Request Flow

Endpoint: POST /a2a — JSON-RPC method message/send

The Agent-to-Agent interface follows the A2A protocol and is designed for autonomous agents that discover the gateway's capabilities and pay for each task programmatically.

Agent discovery

The calling agent fetches GET /.well-known/agent.json to retrieve the AgentCard — a machine-readable description of supported methods, accepted payment schemes, and cost tiers.

Send message/send request

The agent sends a message/send JSON-RPC request to POST /a2a. The gateway scores the request using the smart router, computes the cost, and returns a Task in input-required state containing x402 metadata (amount, payment address, accepted schemes).

Agent signs and pays

The calling agent signs a Solana USDC-SPL transaction for the quoted amount and resubmits the request with the taskId and payment payload attached.

Gateway verifies and proxies

Payment verification follows the same path as the chat completions flow (replay check → Facilitator verify). On success the gateway proxies to the LLM provider.

Return completed Task

The gateway returns a Task in completed state. The response content is attached as artifacts alongside a payment receipt.

Tip

The A2A interface uses the same payment verification stack as the chat completions endpoint. An agent that already knows how to pay for completions needs no additional payment logic to use A2A.

Smart Router

The smart router runs inside the router crate and is invoked whenever a routing profile (auto, eco, premium, free) is used instead of a direct model ID.

How it works:

The scorer evaluates each request across 15 weighted dimensions:

Code presence

Detects code blocks, syntax markers, and programming language keywords.

Reasoning markers

Identifies step-by-step, chain-of-thought, and logical deduction signals.

Technical terms

Weighs domain-specific vocabulary density across the prompt.

Message length and complexity

Long, multi-turn, or highly structured prompts score higher tiers.

The scorer produces a tier classification:

TierDescriptionExample tasks
SimpleShort, factual, or single-turnTranslations, lookups, summarization
MediumModerate context, some reasoningCode review, Q&A, light analysis
ComplexMulti-step, long contextRefactoring, detailed explanations
ReasoningDeep logical chains, math, planningProofs, architecture design, agents

Each routing profile then maps tiers to specific models:

ProfileSimpleMediumComplexReasoning
ecoGemini FlashGemini FlashDeepSeek-V3DeepSeek-R1
autoGemini FlashGPT-4o miniClaude Sonneto3
premiumGPT-4oGPT-4oClaude Opuso3
freeFree-tier onlyFree-tier onlyFree-tier onlyFree-tier only

Note

Scoring is pure rule-based and runs in under 1 microsecond with zero external calls. There is no ML model involved — the scorer is a deterministic weighted sum over the 15 dimensions.

Payment Verification Detail

The client signs a USDC-SPL transfer for the exact quoted amount to the gateway's fee payer address. The Facilitator confirms the transaction is finalized on-chain before the gateway proceeds.

For higher-value or batch requests, the client deposits USDC into the trustless on-chain escrow program (programs/escrow/). The gateway holds a claim receipt and fires a claim transaction after delivering the response. Unclaimed deposits can be refunded by the depositor after a timeout.

Every verified PAYMENT-SIGNATURE is written to Redis with a TTL that matches the payment's expiry. A second request carrying the same signature is rejected with 402 before reaching the Facilitator — protecting against double-spend without an extra on-chain call.

Warning

Replay protection requires Redis. Without it, the gateway skips the Redis check and relies solely on Facilitator-side nonce validation, which is slower and less reliable under high concurrency.