first commit

This commit is contained in:
rayd1o
2026-03-05 11:46:58 +08:00
commit e7033775d8
20657 changed files with 1988940 additions and 0 deletions

605
websocket_protocol.md Normal file
View File

@@ -0,0 +1,605 @@
# WebSocket Protocol Specification
## Connection
```
ws://localhost:8000/ws?token=<access_token>
```
**Connection Steps:**
1. Client connects to WebSocket endpoint
2. Server validates JWT token
3. Server sends `connection_established` message
4. Client sends `subscribe` message (optional)
5. Server begins sending data frames
**Connection Limits:**
- Maximum concurrent connections per user: 3
- Connection timeout (no activity): 5 minutes
- Heartbeat interval: 30 seconds
---
## Message Format
### General Structure
```json
{
"type": "message_type",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": { ... }
}
```
### All Message Types
| Type | Direction | Description |
|------|-----------|-------------|
| connection_established | Server → Client | Initial connection confirmation |
| heartbeat | Bidirectional | Keep-alive ping/pong |
| data_frame | Server → Client | Main data payload |
| control_frame | Client → Server | Camera/display control |
| alert_notification | Server → Client | Real-time alert |
| error | Bidirectional | Error reporting |
| sync_request | Client → Server | Request full sync |
| subscription | Client → Server | Subscribe/unsubscribe channels |
---
## Connection Established
**Server → Client**
```json
{
"type": "connection_established",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"connection_id": "conn_a1b2c3d4",
"server_version": "1.0.0",
"session_id": "sess_xyz789",
"heartbeat_interval": 30,
"supported_channels": ["gpu_clusters", "submarine_cables", "ixp_nodes", "alerts"]
}
}
```
---
## Heartbeat
### Client → Server (Ping)
```json
{
"type": "heartbeat",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "ping"
}
}
```
### Server → Client (Pong)
```json
{
"type": "heartbeat",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "pong",
"latency_ms": 45
}
}
```
**Client Behavior:**
- Send ping every 30 seconds
- If no pong received in 10 seconds, reconnect
- Track latency for monitoring
**Server Behavior:**
- Send pong immediately on receiving ping
- Track connection health
---
## Data Frame (Main Payload)
### Full Update
**Server → Client**
```json
{
"type": "data_frame",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"update_type": "full",
"sequence": 12345,
"payload": {
"meta": {
"generated_at": "2024-01-20T10:30:00Z",
"data_sources": 9,
"total_records": 20800
},
"gpu_clusters": {
"total": 1500,
"last_updated": "2024-01-20T10:00:00Z",
"data": [
{
"id": "epoch-gpu-001",
"name": "Frontier",
"country": "US",
"city": "Oak Ridge, TN",
"lat": 35.9327,
"lng": -84.3107,
"gpu_count": 37888,
"gpu_type": "AMD MI250X",
"total_flops": 1.54e9,
"rank": 1,
"visual": {
"size": 1.0,
"color": "#FF6B6B",
"pulse": true
}
}
]
},
"submarine_cables": {
"total": 436,
"last_updated": "2024-01-20T09:00:00Z",
"data": [
{
"id": "cable-001",
"name": "FASTER",
"length_km": 11600,
"capacity_tbps": 60,
"status": "active",
"landing_points": [
{"lat": 37.7749, "lng": -122.4194},
{"lat": 35.6762, "lng": 139.6503}
],
"visual": {
"width": 2.0,
"color": "#4ECDC4",
"animated": true
}
}
]
},
"ixp_nodes": {
"total": 1200,
"last_updated": "2024-01-20T09:30:00Z",
"data": [
{
"id": "ixp-001",
"name": "Equinix Ashburn",
"country": "US",
"city": "Ashburn, VA",
"lat": 39.0438,
"lng": -77.4874,
"member_count": 250,
"traffic_tbps": 15.5,
"visual": {
"size": 0.8,
"color": "#45B7D1"
}
}
]
},
"cloud_infra": {
"total": 500,
"last_updated": "2024-01-20T08:00:00Z",
"data": [
{
"provider": "AWS",
"region": "us-east-1",
"data_center_count": 15,
"capacity_mw": 500,
"lat": 39.0438,
"lng": -77.4874,
"visual": {
"size": 1.2,
"color": "#FF9900"
}
}
]
}
}
}
}
```
### Incremental Update
**Server → Client**
```json
{
"type": "data_frame",
"timestamp": "2024-01-20T10:35:00.000Z",
"data": {
"update_type": "incremental",
"sequence": 12346,
"base_sequence": 12345,
"changes": {
"gpu_clusters": {
"updated": [
{
"id": "epoch-gpu-002",
"rank": 2,
"gpu_count": 40000
}
],
"added": [],
"removed": []
},
"alerts": {
"new": [
{
"id": 1234,
"severity": "warning",
"message": "API response time > 30s",
"source": "Epoch AI"
}
],
"resolved": [1230]
}
}
}
}
```
---
## Control Frame (Client → Server)
### Camera Position
```json
{
"type": "control_frame",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "camera_set",
"camera": {
"position": {
"latitude": 35.6762,
"longitude": 139.6503,
"altitude": 5000000
},
"target": {
"latitude": 35.6762,
"longitude": 139.6503,
"altitude": 0
},
"rotation": {
"pitch": -45,
"yaw": 0,
"roll": 0
}
}
}
}
```
### Camera Animation
```json
{
"type": "control_frame",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "camera_animate",
"animation": {
"type": "fly_to",
"target": {
"latitude": 39.0438,
"longitude": -77.4874,
"altitude": 3000000
},
"duration_seconds": 3.0,
"easing": "ease_in_out"
}
}
}
```
### Auto-Cruise Control
```json
{
"type": "control_frame",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "cruise_control",
"enabled": true,
"config": {
"speed": 1.0,
"route": "global",
"pause_on_interaction": true
}
}
}
```
### Layer Visibility
```json
{
"type": "control_frame",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "layer_visibility",
"layers": {
"gpu_clusters": true,
"submarine_cables": true,
"ixp_nodes": true,
"cloud_infra": false,
"satellites": false,
"alerts": true
}
}
}
```
### Focus Request
```json
{
"type": "control_frame",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "focus_entity",
"entity_type": "gpu_cluster",
"entity_id": "epoch-gpu-001",
"show_info": true
}
}
```
### Time Range Filter
```json
{
"type": "control_frame",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "set_time_range",
"time_range": {
"start": "2024-01-01T00:00:00Z",
"end": "2024-01-20T23:59:59Z",
"aggregation": "hourly"
}
}
}
```
---
## Alert Notification
**Server → Client**
```json
{
"type": "alert_notification",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"alert": {
"id": 1234,
"severity": "critical",
"title": "Data Collection Failed",
"message": "TOP500 data source failed to collect data",
"source": "TOP500",
"timestamp": "2024-01-20T10:25:00Z",
"actions": ["acknowledge", "retry", "view_details"]
},
"badge_update": {
"critical": 2,
"warning": 5,
"info": 10
}
}
}
```
---
## Sync Request
**Client → Server**
```json
{
"type": "sync_request",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"request_type": "full",
"channels": ["gpu_clusters", "submarine_cables", "ixp_nodes"]
}
}
```
**Server Response:**
Same as `data_frame` with `update_type: "full"`
---
## Subscription Management
### Subscribe
```json
{
"type": "subscription",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "subscribe",
"channels": ["gpu_clusters", "alerts"]
}
}
```
### Unsubscribe
```json
{
"type": "subscription",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "unsubscribe",
"channels": ["alerts"]
}
}
```
**Server Response:**
```json
{
"type": "subscription_confirmed",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"action": "subscribe",
"channels": ["gpu_clusters", "alerts"],
"active_subscriptions": ["gpu_clusters", "alerts"]
}
}
```
---
## Error Messages
### Connection Error
```json
{
"type": "error",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"code": "INVALID_TOKEN",
"message": "Invalid or expired authentication token",
"action": "reconnect_with_fresh_token"
}
}
```
### Rate Limit Error
```json
{
"type": "error",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"code": "RATE_LIMITED",
"message": "Too many requests",
"retry_after_seconds": 30
}
}
```
### Data Error
```json
{
"type": "error",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"code": "DATA_FETCH_FAILED",
"message": "Failed to fetch data from source: Epoch AI",
"source": "Epoch AI",
"will_retry": true,
"retry_in_seconds": 60
}
}
```
### Validation Error
```json
{
"type": "error",
"timestamp": "2024-01-20T10:30:00.000Z",
"data": {
"code": "INVALID_CONTROL_FRAME",
"message": "Invalid camera position",
"details": {
"field": "camera.position.altitude",
"constraint": "Must be positive"
}
}
}
```
---
## Error Codes Reference
| Code | HTTP Equivalent | Description |
|------|-----------------|-------------|
| INVALID_TOKEN | 401 | JWT validation failed |
| TOKEN_EXPIRED | 401 | Token has expired |
| RATE_LIMITED | 429 | Too many requests |
| CHANNEL_NOT_FOUND | 404 | Invalid channel name |
| INVALID_FRAME | 400 | Malformed JSON or structure |
| INVALID_CONTROL_FRAME | 400 | Control action validation failed |
| DATA_FETCH_FAILED | 500 | Backend data collection failed |
| INTERNAL_ERROR | 500 | Server internal error |
---
## Connection State Machine
```
DISCONNECTED
├─→ CONNECTING (token validation)
├─→ AUTHENTICATED ──→ ESTABLISHED
│ │
├─→ ERROR (reconnect) ├─→ RECEIVING DATA
│ │
└───────────────────────────┴─→ DISCONNECTING
└─→ DISCONNECTED
```
---
## Reconnection Strategy
1. **Immediate Retry:** On disconnect, retry after 1 second
2. **Exponential Backoff:** If failed, wait 2, 4, 8, 16 seconds
3. **Max Retries:** 5 attempts before giving up
4. **Token Refresh:** If token expired, refresh before reconnecting
---
## Data Flow Diagram
```
┌──────────┐ ┌──────────┐
│ UE5 │◄───── WebSocket ───►│ Server │
│ Client │ │ │
└────┬─────┘ └────┬─────┘
│ │
│ 1. Connect (with JWT) │
│ 2. Connection Established │
│ │
│ 3. Control Frame (Camera) │
│◄─────────────────────────────────┤
│ │
│ 4. Data Frame (Update) │
│◄─────────────────────────────────┤
│ 5. Heartbeat (30s interval) │
│◄─────────────────────────────────┤
│ │
│ 6. Alert Notification │
│◄─────────────────────────────────┤
```
---
## Performance Considerations
| Metric | Target | Notes |
|--------|--------|-------|
| Data frame size | < 1 MB | Compressed if larger |
| Update latency | < 5 seconds | End-to-end |
| Heartbeat latency | < 100 ms | Server processing |
| Max connections | 1000 per server | With负载均衡 |
**Optimization Strategies:**
- Incremental updates for frequent changes
- Binary encoding for large datasets (MessagePack/Protocol Buffers)
- Compression for data frames (gzip)
- Chunking for large payloads