SDKs

Monitoring

Prometheus metrics, Grafana dashboards, and alerting

Solvela exposes Prometheus metrics at GET /metrics (admin-gated). This page covers the full metrics reference, Grafana dashboard suggestions, and alerting rules.

Prometheus Setup

Scrape Configuration

Add to your prometheus.yml:

scrape_configs:
  - job_name: 'solvela'
    scrape_interval: 15s
    scheme: https
    bearer_token: '<SOLVELA_ADMIN_TOKEN>'
    static_configs:
      - targets: ['solvela-gateway.fly.dev']
    metrics_path: '/metrics'

For local development:

scrape_configs:
  - job_name: 'solvela-local'
    scrape_interval: 5s
    bearer_token: '<SOLVELA_ADMIN_TOKEN>'
    static_configs:
      - targets: ['localhost:8402']

Metrics Reference

Request Metrics

MetricTypeLabelsDescription
solvela_requests_totalCountermethod, path, statusTotal HTTP requests. Excludes /metrics to avoid scrape feedback loops.
solvela_request_duration_secondsHistogrammethod, pathEnd-to-end request processing time including provider latency.
solvela_active_requestsGauge--Number of currently in-flight requests. Uses a drop guard for safety.

Payment Metrics

MetricTypeLabelsDescription
solvela_payments_totalCounterstatusPayment outcomes. Status values: verified (valid payment), dev_bypass (payment bypass enabled in dev), none (402 returned, no payment header), failed (verification failed).
solvela_payment_amount_usdcHistogram--Distribution of payment amounts in USDC.
solvela_replay_rejections_totalCounter--Number of rejected replay attacks.
solvela_paid_stub_rejections_totalCounter--Paid requests that reached the fallback-stub path because every provider failed. The gateway returns an error instead of a stub so a paying caller never receives a fabricated response.

Provider Metrics

MetricTypeLabelsDescription
solvela_provider_request_duration_secondsHistogramproviderUpstream provider response time. Provider values: openai, anthropic, google, xai, deepseek.
solvela_provider_errors_totalCounterprovider, error_typeProvider errors. Error types: timeout, auth, rate_limit, server_error, unknown.

Cache Metrics

MetricTypeLabelsDescription
solvela_cache_totalCounterresultCache outcomes. Result values: hit (served from cache), miss (not cached), skip (caching disabled or streaming).

Escrow Metrics

MetricTypeLabelsDescription
solvela_escrow_claims_totalCounterresultClaim settlement outcomes. Result values: success, failure.
solvela_escrow_queue_depthGauge--Number of pending claims waiting to be processed.

Infrastructure Metrics

MetricTypeLabelsDescription
solvela_fee_payer_balance_solGaugepubkeySOL balance of each fee payer wallet. Monitors tx fee funding.
solvela_service_healthGaugeservice_idExternal service health. 1.0 = healthy, 0.0 = unhealthy.

Grafana Dashboard Suggestions

Overview Panel

  • Request rate: rate(solvela_requests_total[5m]) by status
  • Error rate: rate(solvela_requests_total{status=~"5.."}[5m]) / rate(solvela_requests_total[5m])
  • Active requests: solvela_active_requests
  • P95 latency: histogram_quantile(0.95, rate(solvela_request_duration_seconds_bucket[5m]))

Payment Panel

  • Payment success rate: rate(solvela_payments_total{status="verified"}[5m]) / rate(solvela_payments_total[5m])
  • Revenue (USDC/min): rate(solvela_payment_amount_usdc_sum[5m]) * 60
  • 402 rate: rate(solvela_payments_total{status="none"}[5m])
  • Replay rejections: rate(solvela_replay_rejections_total[5m])

Provider Panel

  • Provider latency by provider: histogram_quantile(0.95, rate(solvela_provider_request_duration_seconds_bucket[5m])) by (provider)
  • Provider error rate: rate(solvela_provider_errors_total[5m]) by (provider, error_type)
  • Cache hit rate: rate(solvela_cache_total{result="hit"}[5m]) / rate(solvela_cache_total[5m])

Escrow Panel

  • Claim queue depth: solvela_escrow_queue_depth
  • Claim success rate: rate(solvela_escrow_claims_total{result="success"}[5m]) / rate(solvela_escrow_claims_total[5m])
  • Fee payer balances: solvela_fee_payer_balance_sol by pubkey

Service Health Panel

  • Service health: solvela_service_health by service_id (1 = up, 0 = down)

Alerting Rules

Example Prometheus alerting rules:

groups:
  - name: solvela
    rules:
      - alert: HighErrorRate
        expr: |
          rate(solvela_requests_total{status=~"5.."}[5m])
          / rate(solvela_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Error rate above 5% for 5 minutes"

      - alert: HighLatency
        expr: |
          histogram_quantile(0.95,
            rate(solvela_request_duration_seconds_bucket[5m])
          ) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 latency above 10 seconds"

      - alert: EscrowQueueBacklog
        expr: solvela_escrow_queue_depth > 100
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Escrow claim queue depth above 100"

      - alert: FeePayerLowBalance
        expr: solvela_fee_payer_balance_sol < 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Fee payer SOL balance below 0.1"

      - alert: ProviderDown
        expr: |
          rate(solvela_provider_errors_total{error_type="server_error"}[5m]) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Provider returning server errors"

      - alert: ReplayAttackSpike
        expr: rate(solvela_replay_rejections_total[5m]) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Elevated replay attack attempts"

      - alert: ServiceUnhealthy
        expr: solvela_service_health == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "External service health check failing"