9.4 KiB
BGP Observability Plan
Goal
Build a global routing observability capability on top of:
The target is to support:
- real-time routing event ingestion
- historical replay and baseline analysis
- anomaly detection
- Earth big-screen visualization
Important Scope Note
These data sources expose the BGP control plane, not user traffic itself.
That means the system can infer:
- route propagation direction
- prefix reachability changes
- AS path changes
- visibility changes across collectors
But it cannot directly measure:
- exact application traffic volume
- exact user packet path
- real bandwidth consumption between countries or operators
Product wording should therefore use phrases like:
- global routing propagation
- route visibility
- control-plane anomalies
- suspected path diversion
Instead of claiming direct traffic measurement.
Data Source Roles
RIS Live
Use RIS Live as the real-time feed.
Recommended usage:
- subscribe to update streams over WebSocket
- ingest announcements and withdrawals continuously
- trigger low-latency alerts
Best suited for:
- hijack suspicion
- withdrawal bursts
- real-time path changes
- live Earth event overlay
BGPStream
Use BGPStream as the historical and replay layer.
Recommended usage:
- backfill time windows
- build normal baselines
- compare current events against history
- support investigations and playback
Best suited for:
- historical anomaly confirmation
- baseline path frequency
- visibility baselines
- postmortem analysis
Recommended Architecture
flowchart LR
A["RIS Live WebSocket"] --> B["Realtime Collector"]
C["BGPStream Historical Access"] --> D["Backfill Collector"]
B --> E["Normalization Layer"]
D --> E
E --> F["data_snapshots"]
E --> G["collected_data"]
E --> H["bgp_anomalies"]
H --> I["Alerts API"]
G --> J["Visualization API"]
H --> J
J --> K["Earth Big Screen"]
Storage Design
The current project already has:
So the lowest-risk path is:
- keep raw and normalized BGP events in
collected_data - use
data_snapshotsto group each ingest window - add a dedicated anomaly table for higher-value derived events
Proposed Data Types
collected_data
Use these source values:
ris_live_bgpbgpstream_bgp
Use these data_type values:
bgp_updatebgp_ribbgp_visibilitybgp_path_change
Recommended stable fields:
sourcesource_identity_keydata_typenamereference_datemetadata
Recommended entity_key strategy:
- event entity:
collector|peer|prefix|event_time - prefix state entity:
collector|peer|prefix - origin state entity:
prefix|origin_asn
metadata schema for raw events
Store the normalized event payload in metadata:
{
"project": "ris-live",
"collector": "rrc00",
"peer_asn": 3333,
"peer_ip": "2001:db8::1",
"event_type": "announcement",
"prefix": "203.0.113.0/24",
"origin_asn": 64496,
"as_path": [3333, 64500, 64496],
"communities": ["3333:100", "64500:1"],
"next_hop": "192.0.2.1",
"med": 0,
"local_pref": null,
"timestamp": "2026-03-26T08:00:00Z",
"raw_message": {}
}
New anomaly table
Add a new table, recommended name: bgp_anomalies
Suggested columns:
idsnapshot_idtask_idsourceanomaly_typeseveritystatusentity_keyprefixorigin_asnnew_origin_asnpeer_scopestarted_atended_atconfidencesummaryevidencecreated_at
This table should represent derived intelligence, not raw updates.
Collector Design
1. RISLiveCollector
Responsibility:
- maintain WebSocket connection
- subscribe to relevant message types
- normalize messages
- write event batches into snapshots
- optionally emit derived anomalies in near real time
Suggested runtime mode:
- long-running background task
Suggested snapshot strategy:
- one snapshot per rolling time window
- for example every 1 minute or every 5 minutes
2. BGPStreamBackfillCollector
Responsibility:
- fetch historical data windows
- normalize to the same schema as real-time data
- build baselines
- re-run anomaly rules on past windows if needed
Suggested runtime mode:
- scheduled task
- or ad hoc task for investigations
Suggested snapshot strategy:
- one snapshot per historical query window
Normalization Rules
Normalize both sources into the same internal event model.
Required normalized fields:
collectorpeer_asnpeer_ipevent_typeprefixorigin_asnas_pathtimestamp
Derived normalized fields:
as_path_lengthcountry_guessprefix_lengthis_more_specificvisibility_weight
Anomaly Detection Rules
Start with these five rules first.
1. Origin ASN Change
Trigger when:
- the same prefix is announced by a new origin ASN not seen in the baseline window
Use for:
- hijack suspicion
- origin drift detection
2. More-Specific Burst
Trigger when:
- a more-specific prefix appears suddenly
- especially from an unexpected origin ASN
Use for:
- subprefix hijack suspicion
3. Mass Withdrawal
Trigger when:
- the same prefix or ASN sees many withdrawals across collectors within a short window
Use for:
- outage suspicion
- regional incident detection
4. Path Deviation
Trigger when:
- AS path length jumps sharply
- or a rarely seen transit ASN appears
- or path frequency drops below baseline norms
Use for:
- route leak suspicion
- unusual path diversion
5. Visibility Drop
Trigger when:
- a prefix is visible from far fewer collectors/peers than its baseline
Use for:
- regional reachability degradation
Baseline Strategy
Use BGPStream historical data to build:
- common origin ASN per prefix
- common AS path patterns
- collector visibility distribution
- normal withdrawal frequency
Recommended baseline windows:
- short baseline: last 24 hours
- medium baseline: last 7 days
- long baseline: last 30 days
The first implementation can start with only the 7-day baseline.
API Design
Raw event API
Add endpoints like:
GET /api/v1/bgp/eventsGET /api/v1/bgp/events/{id}
Suggested filters:
prefixorigin_asnpeer_asncollectorevent_typetime_fromtime_tosource
Anomaly API
Add endpoints like:
GET /api/v1/bgp/anomaliesGET /api/v1/bgp/anomalies/{id}GET /api/v1/bgp/anomalies/summary
Suggested filters:
severityanomaly_typestatusprefixorigin_asntime_fromtime_to
Visualization API
Add an Earth-oriented endpoint like:
GET /api/v1/visualization/geo/bgp-anomalies
Recommended feature shapes:
- point: collector locations
- arc: inferred propagation or suspicious path edge
- pulse point: active anomaly hotspot
Earth Big-Screen Design
Recommended layers:
Layer 1: Collector layer
Show known collector locations and current activity intensity.
Layer 2: Route propagation arcs
Use arcs for:
- origin ASN country to collector country
- or collector-to-collector visibility edges
Important note:
This is an inferred propagation view, not real packet flow.
Layer 3: Active anomaly overlay
Show:
- hijack suspicion in red
- mass withdrawal in orange
- visibility drop in yellow
- path deviation in blue
Layer 4: Time playback
Use data_snapshots to replay:
- minute-by-minute route changes
- anomaly expansion
- recovery timeline
Alerting Strategy
Map anomaly severity to the current alert system.
Recommended severity mapping:
critical- likely hijack
- very large withdrawal burst
high- clear origin change
- large visibility drop
medium- unusual path change
- moderate more-specific burst
low- weak or localized anomalies
Delivery Plan
Phase 1
- add
RISLiveCollector - normalize updates into
collected_data - create
bgp_anomalies - implement 3 rules:
- origin change
- more-specific burst
- mass withdrawal
Phase 2
- add
BGPStreamBackfillCollector - build 7-day baseline
- implement:
- path deviation
- visibility drop
Phase 3
- add Earth visualization layer
- add time playback
- add anomaly filtering and drilldown
Practical Implementation Notes
- Start with IPv4 first, then add IPv6 after the event schema is stable.
- Store the original raw payload in
metadata.raw_messagefor traceability. - Deduplicate events by a stable hash of collector, peer, prefix, type, and timestamp.
- Keep anomaly generation idempotent so replay and backfill do not create duplicate alerts.
- Expect noisy data and partial views; confidence scoring matters.
Recommended First Patch Set
The first code milestone should include:
backend/app/services/collectors/ris_live.pybackend/app/services/collectors/bgpstream.pybackend/app/models/bgp_anomaly.pybackend/app/api/v1/bgp.pybackend/app/api/v1/visualization.pyadd BGP anomaly geo endpointfrontend/src/pagesadd a BGP anomaly list or summary pagefrontend/public/earth/jsadd BGP anomaly rendering layer