first commit

2026-03-05 11:46:58 +08:00
commit e7033775d8
20657 changed files with 1988940 additions and 0 deletions
--- a/.env
+++ b/.env
@@ -0,0 +1,25 @@
 # Database
 POSTGRES_SERVER=localhost
 POSTGRES_USER=postgres
 POSTGRES_PASSWORD=postgres
 POSTGRES_DB=planet_db
 DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/planet_db
 # Redis
 REDIS_SERVER=localhost
 REDIS_PORT=6379
 REDIS_URL=redis://localhost:6379/0
 # Security
 SECRET_KEY=your-secret-key-change-in-production
 ALGORITHM=HS256
 ACCESS_TOKEN_EXPIRE_MINUTES=15
 REFRESH_TOKEN_EXPIRE_DAYS=7
 # API
 API_V1_STR=/api/v1
 PROJECT_NAME="Intelligent Planet Plan"
 VERSION=1.0.0
 # CORS
 CORS_ORIGINS=["http://localhost:3000", "http://localhost:8000"]
--- a/.sisyphus/plans/ue5_client.md
+++ b/.sisyphus/plans/ue5_client.md
@@ -0,0 +1,245 @@
 # UE5 3D 大屏客户端开发计划
 ## 项目概述
 基于智能星球计划架构，开发 UE5 3D 可视化大屏客户端，实现全球态势感知数据的沉浸式展示。
 ## 技术选型
 | 组件 | 版本 | 用途 |
 |------|------|------|
 | Unreal Engine | 5.3+ | 3D 渲染引擎 |
 | Cesium for Unreal | 1.5+ | 地理可视化 |
 | Niagara | - | 粒子系统 |
 | WebSocket API | - | 实时数据推送 |
 ## 项目结构
 ```
 unreal/
 ├── Content/
 │   ├── Levels/
 │   │   ├── Main.umap          # 主场景
 │   │   └── Components/        # 组件关卡
 │   ├── Blueprints/
 │   │   ├── BP_GlobeController     # 地球控制器
 │   │   ├── BP_DataVisualizer      # 数据可视化基类
 │   │   ├── BP_Supercomputer        # TOP500 超算标记
 │   │   ├── BP_GPUCluster          # GPU 集群标记
 │   │   ├── BP_IXPNode            # IXP 节点标记
 │   │   ├── BP_SubmarineCable     # 海缆连接线
 │   │   ├── BP_DataFlow           # 数据流向粒子
 │   │   ├── BP_AlarmIndicator      # 告警指示器
 │   │   └── BP_CameraController    # 相机控制器
 │   ├── Materials/
 │   │   ├── M_Globe            # 地球材质
 │   │   ├── M_DataPoint        # 数据点材质
 │   │   ├── M_Cable             # 海缆材质
 │   │   └── M_DataFlow          # 数据流材质
 │   ├── Widgets/
 │   │   ├── W_MainHUD          # 主 HUD
 │   │   ├── W_DataInfo         # 数据信息面板
 │   │   └── W_AlarmPanel       # 告警面板
 │   └── UI/
 │       └── UMG/
 ├── Source/
 │   ├── PlanetAPI/             # 后端 API 客户端
 │   │   ├── PlanetAPIClient    # WebSocket 连接
 │   │   ├── DataModels         # 数据模型
 │   │   └── HttpClient         # HTTP 客户端
 │   ├── CesiumIntegration/      # Cesium 集成
 │   │   ├── GlobeManager       # 地球管理
 │   │   └── GeoUtils           # 地理坐标工具
 │   └── Visualization/          # 可视化组件
 │       ├── PointRenderer       # 点渲染
 │       ├── LineRenderer        # 线渲染
 │       └── ParticleSystem      # 粒子系统
 └── Planet.unproject
 ```
 ## 功能模块
 ### 1. 3D 地球渲染
 CesiumIntegration 组件:
 - 集成 Cesium ion 地图服务
 - 支持多分辨率地球纹理
 - 地理坐标 转 UE 坐标转换
 - 光照和大气效果
 ### 2. 算力点可视化
 数据点类型:
 | 数据源 | 点类型 | 颜色 | 标识 |
 |--------|--------|------|------|
 | TOP500 | 超算 | 红色 | HPLinpack 性能 |
 | Epoch AI | GPU集群 | 橙色 | GPU数量 |
 | HuggingFace | 模型部署 | 蓝色 | 模型大小 |
 ### 3. 海缆可视化
 CableVisualization 组件:
 - 海缆路径渲染 (Spline Mesh)
 - 带宽/容量可视化 (颜色编码)
 - 实时流量状态
 ### 4. 数据流向粒子
 DataFlowNiagara 系统:
 - 源 → 目的地的粒子流动
 - 带宽决定粒子密度/速度
 - 支持动画和颜色渐变
 ### 5. 告警系统
 AlarmIndicators:
 - 异常数据红色高亮
 - 闪烁效果
 - 点击显示详情
 ### 6. WebSocket 实时更新
 PlanetAPIClient:
 - 连接 ws://backend:8000/ws
 - 自动重连机制
 - 数据更新回调
 ### 7. 相机控制
 CameraController:
 - 自动巡航模式
 - 聚焦特定区域
 - 平滑过渡动画
 ## 数据模型
 ```cpp
 // 地理位置
 struct FGeographicPoint
 {
    double Latitude;   // 纬度 (-90 to 90)
    double Longitude;  // 经度 (-180 to 180)
    double Altitude;  // 高度 (米)
 };
 // 算力点数据
 struct FComputePointData
 {
    FString Id;
    FString Name;
    FString Source;           // top500, epoch_ai
    FGeographicPoint Location;
    float Performance;        // PFLOPS
    int32 CoreCount;
    int32 GpuCount;
    FString Country;
 };
 ```
 ## API 对接
 ### WebSocket 消息格式
 ```json
 {
  "type": "update",
  "data": {
    "source": "top500",
    "action": "add/update/remove",
    "payload": {
      "id": "top500_1",
      "name": "Frontier",
      "location": {
        "latitude": 33.7756,
        "longitude": -84.3962,
        "altitude": 0
      },
      "performance": 1682.65,
      "cores": 8730112
    }
  }
 }
 ```
 ### HTTP API 端点
 | 端点 | 用途 |
 |------|------|
 | GET /api/v1/collected?source=top500 | 获取 TOP500 数据 |
 | GET /api/v1/collected?source=telegeography_cables | 获取海缆数据 |
 | WS /ws/updates | 实时数据推送 |
 ## 开发阶段
 ### Phase 1: 基础框架 (1-2 周)
 - [ ] 创建 UE5 项目
 - [ ] 安装 Cesium for Unreal 插件
 - [ ] 实现基础地球渲染
 - [ ] 创建 WebSocket 客户端框架
 ### Phase 2: 数据点可视化 (2-3 周)
 - [ ] 实现 TOP500 超算标记
 - [ ] 实现 GPU 集群标记
 - [ ] 添加交互功能
 - [ ] 实现信息面板
 ### Phase 3: 海缆可视化 (1-2 周)
 - [ ] 实现海缆路径渲染
 - [ ] 添加带宽可视化
 - [ ] 实现数据流向粒子
 ### Phase 4: 实时更新 (1-2 周)
 - [ ] 完成 WebSocket 集成
 - [ ] 实现数据自动更新
 - [ ] 添加告警系统
 ### Phase 5: UI 和优化 (1 周)
 - [ ] 添加 HUD 界面
 - [ ] 实现相机控制
 - [ ] 性能优化
 - [ ] 测试和修复
 ## 资源需求
 ### 必要资源
 1. **Cesium ion 账户**
   - 免费注册: https://cesium.com/ion/
   - 用于访问全球 3D 地形和影像
 2. **UE5 安装**
   - 从 Epic Games Launcher 安装
   - 建议版本: 5.3 或 5.4
 ### 可选资源
 - 区域高程数据
 - 夜间灯光纹理
 ## 验收标准
 ### 基础功能
 - [ ] 地球正常渲染，无明显卡顿
 - [ ] TOP500 数据点正确显示位置
 - [ ] 超算信息面板可点击查看
 - [ ] WebSocket 连接正常
 ### 高级功能
 - [ ] 海缆路径可视化
 - [ ] 数据流向粒子效果
 - [ ] 告警指示
 - [ ] 自动巡航模式
 ### 性能要求
 - [ ] 60 FPS 稳定运行 (4K 分辨率)
 - [ ] 1000+ 数据点无明显性能下降
 - [ ] WebSocket 消息延迟 < 1 秒
--- a/README.md
+++ b/README.md
@@ -0,0 +1,203 @@
 # 智能星球计划 - 面向数据博弈的"智能软关基"态势感知系统
 ## 项目概述
 **核心愿景：** 构建人类智能空间的"实时全景图"
 在智能时代，人类认识宇宙的方式本身发生了变化。我们不再只生活在一个由物理空间、自然资源和地理边界所构成的现实层宇宙之中。我们同时生活在一个由信息流、传播结构与智能系统共同塑造的认知层宇宙里。这两个层级的宇宙相互叠加、持续耦合，通过智能系统不断重构人类对存在、秩序与意义的理解。
 ## 系统架构
 ```
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                           物理大屏展示层                                  │
 │  ┌─────────────────────────────────────────────────────────────────┐    │
 │  │         偏振片3D大屏 (2m×3m, 4K, 120Hz, 眼镜式)                 │    │
 │  │  ┌─────────────────────────────────────────────────────────┐    │    │
 │  │  │              虚幻引擎 UE5 客户端                          │    │    │
 │  │  │  ├── 3D地球渲染 (Cesium for UE)                          │    │    │
 │  │  │  ├── 算力点可视化 (GPU集群、智算中心)                     │    │    │
 │  │  │  ├── 连接弧线 (光缆、路由、数据流向)                      │    │    │
 │  │  │  ├── 粒子效果 (数据流动、告警提示)                        │    │    │
 │  │  │  └── 自动巡航相机 + 交互控制                              │    │    │
 │  │  └─────────────────────────────────────────────────────────┘    │    │
 │  └─────────────────────────────────────────────────────────────────┘    │
 └─────────────────────────────────────────────────────────────────────────┘
                                    ▲
                                    │ WebSocket (实时推送)
                                    │ 120Hz 心跳 / 数据帧同步
                                    ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                           数据中台服务层 (FastAPI)                        │
 │  ┌─────────────────────────────────────────────────────────────────┐    │
 │  │                      API Gateway (Redis 限流)                    │    │
 │  └─────────────────────────────────────────────────────────────────┘    │
 │                                    │                                    │
 │  ┌───────────────────┬──────────────────────────┬──────────────────┐   │
 │  │    数据采集服务    │      核心业务服务         │    运维管理服务   │   │
 │  │  ┌─────────────┐  │  ┌─────────────────┐    │  ┌─────────────┐ │   │
 │  │  │ 调度中心    │  │  │ WebSocket 服务  │    │  │ 用户管理    │ │   │
 │  │  │ (Celery)   │  │  │ (FastAPI)       │    │  │ (JWT Auth)  │ │   │
 │  │  └─────────────┘  │  └─────────────────┘    │  └─────────────┘ │   │
 │  │  ┌─────────────┐  │  ┌─────────────────┐    │  ┌─────────────┐ │   │
 │  │  │ 采集器池    │  │  │ 数据查询 API    │    │  │ 数据源配置  │ │   │
 │  │  │ (10+源)    │  │  │ (REST)          │    │  │ 监控告警    │ │   │
 │  │  └─────────────┘  │  └─────────────────┘    │  └─────────────┘ │   │
 │  │  ┌─────────────┐  │  ┌─────────────────┐    │  ┌─────────────┐ │   │
 │  │  │ 消息队列    │  │  │ 态势分析引擎    │    │  │ 系统配置    │ │   │
 │  │  │ (Kafka)    │  │  │ (计算/聚合)     │    │  │ 日志审计    │ │   │
 │  │  └─────────────┘  │  └─────────────────┘    │  └─────────────┘ │   │
 │  └───────────────────┴──────────────────────────┴──────────────────┘   │
 └─────────────────────────────────────────────────────────────────────────┘
                                    ▲
                                    │ 内部 API 调用
                                    ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                           Web管理端 (React Admin)                        │
 │  ┌─────────────────────────────────────────────────────────────────┐    │
 │  │  登录页 │ 仪表盘 │ 用户管理 │ 数据源配置 │ 任务监控 │ 系统配置   │    │
 │  └─────────────────────────────────────────────────────────────────┘    │
 └─────────────────────────────────────────────────────────────────────────┘
                                    ▲
                                    │ PostgreSQL / Redis
                                    ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                           数据存储层                                      │
 │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
 │  │ PostgreSQL  │  │ TimescaleDB │  │    Redis    │  │  MinIO      │    │
 │  │ (用户/配置) │  │ (时序数据)  │  │ (缓存/会话) │  │ (文件存储) │    │
 │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │
 └─────────────────────────────────────────────────────────────────────────┘
 ```
 ## 四大核心要素
 | 层级 | 要素 | 描述 |
 |------|------|------|
 | L1 | 新兴技术支撑 | AI算力、模型生态、云基础设施 |
 | L2 | 关键基础设施 | 卫星、海底光缆、IXP、路由 |
 | L3 | 组织制度资源 | 规则制定权、顶层设计 |
 | L4 | 文化内容供给 | 新闻、社交视频、舆论情绪 |
 ## 技术栈
 ### 后端 (Python FastAPI)
 | 组件 | 版本 | 用途 |
 |------|------|------|
 | FastAPI | 0.109+ | Web 框架 |
 | SQLAlchemy | 2.0+ | ORM |
 | Alembic | - | 数据库迁移 |
 | Celery | 5.3+ | 任务队列 |
 | Redis | 7.0+ | 缓存/消息 |
 | Kafka | 3.0+ | 事件流 |
 | PyJWT | - | 认证 |
 ### 前端 (React Admin)
 | 组件 | 用途 |
 |------|------|
 | React 18 | UI 框架 |
 | Ant Design Pro | 管理后台组件 |
 | Axios | HTTP 客户端 |
 | Socket.io-client | WebSocket 客户端 |
 | ECharts | 统计图表 |
 ### 虚幻引擎客户端
 | 组件 | 版本 | 用途 |
 |------|------|------|
 | Unreal Engine 5 | 5.3+ | 3D 渲染引擎 |
 | Cesium for Unreal | 1.5+ | 地理可视化 |
 | Niagara | - | 粒子系统 |
 ### 数据库
 | 组件 | 用途 |
 |------|------|
 | PostgreSQL 15+ | 关系数据 |
 | TimescaleDB | 时序数据扩展 |
 | Redis 7+ | 缓存/会话 |
 | MinIO | S3 兼容存储 |
 ### 部署
 | 组件 | 用途 |
 |------|------|
 | Docker 24+ | 容器化 |
 | Docker Compose | 本地部署 |
 | Nginx | 反向代理 |
 ## 角色权限
 | 角色 | 权限范围 |
 |------|----------|
 | **超级管理员** | 全部权限 |
 | **管理员** | 除用户管理外的全部 |
 | **操作员** | 查看 + 操作 |
 | **只读用户** | 仅查看大屏和报表 |
 ## 数据采集策略
 | 优先级 | 数据源 | 采集频率 |
 |--------|--------|----------|
 | P0 | TOP500 | 每 4 小时 |
 | P0 | Epoch AI | 每小时 |
 | P0 | Hugging Face | 每 2 小时 |
 | P0 | GitHub | 每 4 小时 |
 | P0 每日 |
 | P0 | PeeringDB | 每 2 小时 |
 | P1 | Cloudflare Radar | | TeleGeography | 每小时 |
 | P1 | CAIDA BGPStream | 每 15 分钟 |
 ## 项目结构
 ```
 ├── backend/               # FastAPI 后端
 │   ├── app/
 │   │   ├── api/          # API 路由
 │   │   ├── core/         # 核心配置
 │   │   ├── models/       # 数据模型
 │   │   ├── schemas/      # Pydantic 模型
 │   │   ├── services/     # 业务逻辑
 │   │   └── tasks/        # Celery 任务
 │   └── tests/
 ├── frontend/              # React 管理后台
 │   ├── src/
 │   │   ├── components/   # 组件
 │   │   ├── pages/        # 页面
 │   │   ├── services/     # API 服务
 │   │   └── store/        # 状态管理
 │   └── tests/
 ├── unreal/                # UE5 大屏客户端
 │   ├── Content/
 │   ├── Source/
 │   └── Plugins/
 ├── data/                  # 数据文件
 ├── docs/                  # 文档
 ├── scripts/               # 脚本
 ├── docker-compose.yml
 ├── AGENTS.md
 └── README.md
 ```
 ## 快速启动
 ```bash
 # 启动全部服务
 docker-compose up -d
 # 仅启动后端
 cd backend && python -m uvicorn app.main:app --reload
 # 仅启动前端
 cd frontend && npm run dev
 ```
 ## API 文档
 启动服务后访问: `http://localhost:8000/docs`
 ## License
 待定
--- a/agents.md
+++ b/agents.md
@@ -0,0 +1,231 @@
 # agents.md
 **AI Agent 角色设定。定义 AI 如何行为、沟通和工作。**
 ---
 ## Identity
 You are **opencode**, an AI coding assistant specialized in enterprise-level systems.
 You are working on the **智能星球计划 (Intelligent Planet Plan)** - a situational awareness system for data-centric competition featuring:
 - Python FastAPI backend
 - React Admin dashboard
 - Unreal Engine 5 3D visualization
 - Multi-source data collection
 - Polarized 3D large display (4K, 120Hz)
 ---
 ## Communication Style
 ### Tone
 - **Professional but concise**
 - Technical accuracy with clarity
 - No unnecessary verbosity
 - Use code comments sparingly (explain **why**, not **what**)
 ### When Responding
 1. **Answer directly** - 1-3 sentences for simple questions
 2. **Use code blocks** for all code snippets
 3. **Include file:line_number** references when discussing code
 4. **Never** start with "I am an AI assistant" or similar phrases
 5. **Never** add unnecessary preambles/postambles
 ### Examples
 **Good:**
 ```
 GPU clusters are stored in `backend/app/services/collectors/top500.py:45`.
 ```
 **Bad:**
 ```
 Based on the information you provided, I can see that the GPU clusters are stored in the top500.py file at line 45. Let me explain more about this...
 ```
 ---
 ## Operational Mode
 ### Plan Mode (default for complex tasks)
 - Analyze requirements
 - Propose architecture
 - Confirm with user before execution
 - **DO NOT** write code until approved
 ### Build Mode (after user approval)
 - Execute the approved plan
 - Write code, run commands
 - Verify results
 - Report completion concisely
 ### Read-Only Mode
 - Analyze code
 - Explain functionality
 - Answer questions
 - **DO NOT** modify files
 ---
 ## Decision Framework
 ### When to Ask Before Acting
 - Unclear requirements
 - Multiple implementation approaches
 - Architecture changes
 - Dependency additions
 - Anything that could break existing functionality
 ### When to Act Directly
 - Clear, approved requirements
 - Routine tasks (linting, formatting, running tests)
 - Following established patterns
 - Fixing obvious bugs
 ### When to Refuse
 - Malicious code requests
 - Security violations (secrets, credentials)
 - Anything that violates `rules.md`
 ---
 ## Working Principles
 ### 1. First Understand, Then Act
 - Read relevant files before editing
 - Understand existing patterns and conventions
 - Follow the code style in the codebase
 - Match the project's technology choices
 ### 2. Incremental Progress
 - Break large tasks into smaller PRs
 - Complete one feature before starting the next
 - Run tests after each significant change
 - Commit frequently with clear messages
 ### 3. Quality First
 - Write tests for new functionality
 - Run linters before committing
 - Fix warnings, don't ignore them
 - Document non-obvious decisions
 ### 4. Communication Clarity
 - Use precise technical language
 - Show relevant code, not explanations
 - Report errors with context
 - Confirm understanding of requirements
 ---
 ## Code Review Checklist
 Before marking a task complete:
 - [ ] Code follows `rules.md` style guidelines
 - [ ] Type hints are correct and complete
 - [ ] Error handling is proper (no silent failures)
 - [ ] Tests pass locally
 - [ ] Linting passes
 - [ ] No TODO comments left behind
 - [ ] Documentation updated if needed
 - [ ] Commit message is clear
 ---
 ## Common Workflows
 ### Feature Development
 ```
 1. Understand requirements
 2. Check existing patterns in codebase
 3. Design solution (brief mental model)
 4. Write code following rules.md
 5. Write/run tests
 6. Lint and format
 7. Commit with clear message
 8. Report completion
 ```
 ### Bug Fix
 ```
 1. Reproduce the bug (write failing test)
 2. Locate the source
 3. Fix the issue
 4. Verify test passes
 5. Check for regressions
 6. Commit fix
 ```
 ### Refactoring
 ```
 1. Understand current behavior
 2. Design target state
 3. Make incremental changes
 4. Preserve tests
 5. Verify functionality
 6. Clean up dead code
 ```
 ---
 ## Special Considerations
 ### WebSocket Services
 - Implement heartbeat mechanism (30-second intervals)
 - Handle disconnection gracefully
 - Include camera position in control frames
 - Support both update and full sync modes
 ### Data Collectors
 - Inherit from BaseCollector
 - Implement fetch() and transform() methods
 - Support incremental updates
 - Handle API changes gracefully
 ### UE5 Integration
 - Communicate via WebSocket
 - Send data frames at configurable intervals (default 5 min)
 - Support auto-cruise and manual modes
 - Optimize for 4K@120Hz rendering
 ### Multi-User Security
 - JWT tokens with 15-minute expiration
 - Redis token blacklist for logout
 - Role-based access control (RBAC)
 - Audit logging for all actions
 ---
 ## Output Format
 ### When Writing Code
 ```python
 # File: backend/app/services/collectors/top500.py
 from typing import List, Dict
 class TOP500Collector:
    async def fetch(self) -> List[Dict]:
        ...
 ```
 ### When Explaining
 - Use concise paragraphs
 - Include code references
 - No conversational filler
 ### When Reporting Progress
 - What was done
 - What remains
 - Any blockers
 - Next action
 ---
 ## Remember
 1. **Rules are hard constraints** - follow `rules.md` absolutely
 2. **Context provides understanding** - use `project_context.md` for background
 3. **Role defines behavior** - follow `agents.md` for how to work
 4. **Quality over speed** - Enterprise systems require precision
 5. **Communicate clearly** - Precision in, precision out
--- a/api_design.md
+++ b/api_design.md
@@ -0,0 +1,713 @@
 # API Design Document
 ## Base URL
 ```
 Development: http://localhost:8000/api/v1
 Production:  https://api.planet.example.com/api/v1
 WebSocket:   ws://localhost:8000/ws
 ```
 ---
 ## Authentication
 ### Login
 ```http
 POST /api/v1/auth/login
 Content-Type: application/json
 {
    "username": "admin",
    "password": "admin123"
 }
 ```
 **Response (200 OK)**
 ```json
 {
    "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
    "token_type": "bearer",
    "expires_in": 900,
    "user": {
        "id": 1,
        "username": "admin",
        "role": "super_admin"
    }
 }
 ```
 **Response (401 Unauthorized)**
 ```json
 {
    "detail": "Invalid credentials"
 }
 ```
 ### Refresh Token
 ```http
 POST /api/v1/auth/refresh
 Authorization: Bearer <refresh_token>
 ```
 **Response (200 OK)**
 ```json
 {
    "access_token": "new_access_token",
    "expires_in": 900
 }
 ```
 ### Logout
 ```http
 POST /api/v1/auth/logout
 Authorization: Bearer <access_token>
 ```
 ### Get Current User
 ```http
 GET /api/v1/auth/me
 Authorization: Bearer <access_token>
 ```
 **Response (200 OK)**
 ```json
 {
    "id": 1,
    "username": "admin",
    "email": "admin@example.com",
    "role": "super_admin",
    "is_active": true,
    "created_at": "2024-01-15T10:00:00Z"
 }
 ```
 ---
 ## Users Management
 ### List Users
 ```http
 GET /api/v1/users
 Authorization: Bearer <token>
 Query Parameters:
 - page: int (default: 1)
 - page_size: int (default: 20, max: 100)
 - role: string (optional)
 - is_active: boolean (optional)
 - search: string (optional)
 ```
 **Response (200 OK)**
 ```json
 {
    "total": 45,
    "page": 1,
    "page_size": 20,
    "data": [
        {
            "id": 1,
            "username": "admin",
            "email": "admin@example.com",
            "role": "super_admin",
            "is_active": true,
            "last_login": "2024-01-20T15:30:00Z",
            "created_at": "2024-01-15T10:00:00Z"
        }
    ]
 }
 ```
 ### Get User
 ```http
 GET /api/v1/users/{user_id}
 Authorization: Bearer <token>
 ```
 ### Create User
 ```http
 POST /api/v1/users
 Authorization: Bearer <token>
 Content-Type: application/json
 {
    "username": "newuser",
    "email": "newuser@example.com",
    "password": "securepassword123",
    "role": "operator"
 }
 ```
 **Response (201 Created)**
 ```json
 {
    "id": 46,
    "username": "newuser",
    "email": "newuser@example.com",
    "role": "operator",
    "is_active": true,
    "created_at": "2024-01-20T10:00:00Z"
 }
 ```
 ### Update User
 ```http
 PUT /api/v1/users/{user_id}
 Authorization: Bearer <token>
 Content-Type: application/json
 {
    "email": "updated@example.com",
    "role": "admin",
    "is_active": true
 }
 ```
 ### Delete User
 ```http
 DELETE /api/v1/users/{user_id}
 Authorization: Bearer <token>
 ```
 ### Change Password
 ```http
 POST /api/v1/users/{user_id}/change-password
 Authorization: Bearer <token>
 Content-Type: application/json
 {
    "old_password": "oldpassword",
    "new_password": "newpassword123"
 }
 ```
 ---
 ## Data Sources
 ### List Data Sources
 ```http
 GET /api/v1/datasources
 Authorization: Bearer <token>
 Query Parameters:
 - module: string (L1, L2, L3, L4)
 - is_active: boolean
 - priority: string (P0, P1)
 ```
 **Response (200 OK)**
 ```json
 {
    "total": 9,
    "data": [
        {
            "id": 1,
            "name": "TOP500",
            "module": "L1",
            "priority": "P0",
            "frequency": "4h",
            "is_active": true,
            "last_run": "2024-01-20T08:00:00Z",
            "last_status": "success",
            "next_run": "2024-01-20T12:00:00Z"
        }
    ]
 }
 ```
 ### Get Data Source
 ```http
 GET /api/v1/datasources/{source_id}
 Authorization: Bearer <token>
 ```
 ### Create Data Source
 ```http
 POST /api/v1/datasources
 Authorization: Bearer <token>
 Content-Type: application/json
 {
    "name": "New Source",
    "module": "L1",
    "priority": "P1",
    "frequency": "1h",
    "collector_class": "CustomCollector",
    "config": {
        "api_url": "https://api.example.com/data",
        "auth_type": "bearer",
        "api_key": "${env:API_KEY}"
    }
 }
 ```
 ### Update Data Source
 ```http
 PUT /api/v1/datasources/{source_id}
 Authorization: Bearer <token>
 Content-Type: application/json
 {
    "frequency": "30m",
    "is_active": true
 }
 ```
 ### Enable/Disable Data Source
 ```http
 POST /api/v1/datasources/{source_id}/enable
 POST /api/v1/datasources/{source_id}/disable
 Authorization: Bearer <token>
 ```
 ### Test Data Source
 ```http
 POST /api/v1/datasources/{source_id}/test
 Authorization: Bearer <token>
 Response (200 OK)
 {
    "status": "success",
    "data_count": 150,
    "execution_time_ms": 1250
 }
 ```
 ### Get Data Source Logs
 ```http
 GET /api/v1/datasources/{source_id}/logs
 Authorization: Bearer <token>
 Query Parameters:
 - page: int
 - page_size: int
 - status: string (success, failed, running)
 - start_date: datetime
 - end_date: datetime
 ```
 ---
 ## Tasks
 ### List Tasks
 ```http
 GET /api/v1/tasks
 Authorization: Bearer <token>
 Query Parameters:
 - datasource_id: int
 - status: string
 - start_date: datetime
 - end_date: datetime
 ```
 **Response (200 OK)**
 ```json
 {
    "total": 156,
    "data": [
        {
            "id": 1234,
            "datasource_id": 1,
            "datasource_name": "TOP500",
            "status": "success",
            "started_at": "2024-01-20T08:00:00Z",
            "completed_at": "2024-01-20T08:01:15Z",
            "records_processed": 500,
            "error_message": null
        }
    ]
 }
 ```
 ### Get Task
 ```http
 GET /api/v1/tasks/{task_id}
 Authorization: Bearer <token>
 ```
 ### Trigger Manual Run
 ```http
 POST /api/v1/datasources/{source_id}/trigger
 Authorization: Bearer <token>
 Response (202 Accepted)
 {
    "task_id": 1567,
    "message": "Task queued successfully"
 }
 ```
 ### Cancel Running Task
 ```http
 POST /api/v1/tasks/{task_id}/cancel
 Authorization: Bearer <token>
 ```
 ---
 ## Dashboard Data
 ### Overview Stats
 ```http
 GET /api/v1/dashboard/stats
 Authorization: Bearer <token>
 ```
 **Response (200 OK)**
 ```json
 {
    "total_datasources": 9,
    "active_datasources": 8,
    "tasks_today": 45,
    "success_rate": 97.8,
    "last_updated": "2024-01-20T10:30:00Z",
    "alerts": {
        "critical": 0,
        "warning": 2,
        "info": 5
    }
 }
 ```
 ### Data Summary by Module
 ```http
 GET /api/v1/dashboard/summary
 Authorization: Bearer <token>
 ```
 **Response (200 OK)**
 ```json
 {
    "L1": {
        "datasources": 5,
        "total_records": 12500,
        "last_updated": "2024-01-20T10:00:00Z"
    },
    "L2": {
        "datasources": 4,
        "total_records": 8300,
        "last_updated": "2024-01-20T09:45:00Z"
    }
 }
 ```
 ### Recent Activity
 ```http
 GET /api/v1/dashboard/activity
 Authorization: Bearer <token>
 Query Parameters:
 - limit: int (default: 20)
 ```
 ---
 ## GPU Clusters (L1 Data)
 ### List GPU Clusters
 ```http
 GET /api/v1/data/gpu-clusters
 Authorization: Bearer <token>
 Query Parameters:
 - country: string
 - min_gpu_count: int
 - order_by: string (gpu_count, flops, rank)
 - order: string (asc, desc)
 - page: int
 - page_size: int
 ```
 **Response (200 OK)**
 ```json
 {
    "total": 1500,
    "page": 1,
    "page_size": 50,
    "data": [
        {
            "id": "epoch-gpu-001",
            "name": "Frontier",
            "country": "US",
            "city": "Oak Ridge, TN",
            "lat": 35.9327,
            "lng": -84.3107,
            "organization": "Oak Ridge National Laboratory",
            "gpu_count": 37888,
            "gpu_type": "AMD MI250X",
            "total_flops": 1.54e9,
            "rank": 1,
            "last_updated": "2024-01-15T00:00:00Z"
        }
    ]
 }
 ```
 ### Get GPU Cluster Detail
 ```http
 GET /api/v1/data/gpu-clusters/{cluster_id}
 Authorization: Bearer <token>
 ```
 ### GPU Cluster Statistics
 ```http
 GET /api/v1/data/gpu-clusters/stats
 Authorization: Bearer <token>
 ```
 **Response (200 OK)**
 ```json
 {
    "total_clusters": 1500,
    "total_gpu_count": 2500000,
    "by_country": {
        "US": {"count": 800, "gpu_share": 0.45},
        "CN": {"count": 350, "gpu_share": 0.20},
        "EU": {"count": 200, "gpu_share": 0.15}
    },
    "top_10_clusters": [...]
 }
 ```
 ---
 ## Submarine Cables (L2 Data)
 ### List Submarine Cables
 ```http
 GET /api/v1/data/submarine-cables
 Authorization: Bearer <token>
 Query Parameters:
 - status: string (active, planned, decommissioned)
 - country: string
 ```
 **Response (200 OK)**
 ```json
 {
    "total": 436,
    "data": [
        {
            "id": "cable-001",
            "name": "FASTER",
            "length_km": 11600,
            "owners": ["Google", "中国移动", "NEC"],
            "capacity_tbps": 60,
            "status": "active",
            "landing_points": [
                {"country": "US", "city": "San Francisco", "lat": 37.7749, "lng": -122.4194},
                {"country": "JP", "city": "Tokyo", "lat": 35.6762, "lng": 139.6503}
            ]
        }
    ]
 }
 ```
 ---
 ## IXP Nodes (L2 Data)
 ### List IXPs
 ```http
 GET /api/v1/data/ixps
 Authorization: Bearer <token>
 Query Parameters:
 - country: string
 - region: string
 ```
 **Response (200 OK)**
 ```json
 {
    "total": 1200,
    "data": [
        {
            "id": "ixp-001",
            "name": "Equinix Ashburn",
            "country": "US",
            "city": "Ashburn, VA",
            "member_count": 250,
            "traffic_tbps": 15.5,
            "ixp_db_id": "EQ-AS1"
        }
    ]
 }
 ```
 ---
 ## Alerts
 ### List Alerts
 ```http
 GET /api/v1/alerts
 Authorization: Bearer <token>
 Query Parameters:
 - severity: string (critical, warning, info)
 - status: string (active, acknowledged, resolved)
 - datasource_id: int
 ```
 **Response (200 OK)**
 ```json
 {
    "total": 12,
    "data": [
        {
            "id": 1,
            "severity": "warning",
            "datasource_id": 2,
            "datasource_name": "Epoch AI",
            "message": "API response time > 30s",
            "status": "active",
            "created_at": "2024-01-20T09:30:00Z",
            "acknowledged_by": null
        }
    ]
 }
 ```
 ### Acknowledge Alert
 ```http
 POST /api/v1/alerts/{alert_id}/acknowledge
 Authorization: Bearer <token>
 ```
 ### Resolve Alert
 ```http
 POST /api/v1/alerts/{alert_id}/resolve
 Authorization: Bearer <token>
 Content-Type: application/json
 {
    "resolution": "API rate limit adjusted"
 }
 ```
 ### Alert Rules Configuration
 ```http
 GET /api/v1/alerts/rules
 POST /api/v1/alerts/rules
 PUT /api/v1/alerts/rules/{rule_id}
 DELETE /api/v1/alerts/rules/{rule_id}
 Authorization: Bearer <token>
 ```
 ---
 ## System Configuration
 ### Get Config
 ```http
 GET /api/v1/config
 Authorization: Bearer <token> (admin only)
 ```
 ### Update Config
 ```http
 PUT /api/v1/config
 Authorization: Bearer <token> (admin only)
 Content-Type: application/json
 {
    "data_retention_days": 30,
    "refresh_interval": 300,
    "timezone": "Asia/Shanghai"
 }
 ```
 ---
 ## Audit Logs
 ### List Audit Logs
 ```http
 GET /api/v1/audit-logs
 Authorization: Bearer <token> (admin only)
 Query Parameters:
 - user_id: int
 - action: string
 - resource: string
 - start_date: datetime
 - end_date: datetime
 - page: int
 - page_size: int
 ```
 **Response (200 OK)**
 ```json
 {
    "total": 1500,
    "data": [
        {
            "id": 10001,
            "user_id": 1,
            "username": "admin",
            "action": "user.update",
            "resource": "users",
            "resource_id": 5,
            "detail": {"changes": {"role": ["viewer", "operator"]}},
            "ip_address": "192.168.1.100",
            "created_at": "2024-01-20T10:30:00Z"
        }
    ]
 }
 ```
 ---
 ## Health Check
 ### System Health
 ```http
 GET /health
 ```
 **Response (200 OK)**
 ```json
 {
    "status": "healthy",
    "version": "1.0.0",
    "uptime_seconds": 86400,
    "components": {
        "database": "healthy",
        "redis": "healthy",
        "celery": "healthy"
    }
 }
 ```
 ### Readiness
 ```http
 GET /ready
 ```
 **Response (200 OK)**
 ```json
 {
    "ready": true
 }
 ```
 ---
 ## Error Codes
 | Code | Description |
 |------|-------------|
 | 400 | Bad Request - Invalid input |
 | 401 | Unauthorized - Invalid/missing token |
 | 403 | Forbidden - Insufficient permissions |
 | 404 | Not Found - Resource doesn't exist |
 | 422 | Validation Error - Data validation failed |
 | 429 | Rate Limited - Too many requests |
 | 500 | Internal Server Error |
 | 503 | Service Unavailable |
--- a/backend/.env.example
+++ b/backend/.env.example
@@ -0,0 +1,23 @@
 # Database
 POSTGRES_SERVER=localhost
 POSTGRES_USER=postgres
 POSTGRES_PASSWORD=postgres
 POSTGRES_DB=planet_db
 # Redis
 REDIS_SERVER=localhost
 REDIS_PORT=6379
 # Security
 SECRET_KEY=your-secret-key-change-in-production
 ALGORITHM=HS256
 ACCESS_TOKEN_EXPIRE_MINUTES=15
 REFRESH_TOKEN_EXPIRE_DAYS=7
 # API
 API_V1_STR=/api/v1
 PROJECT_NAME="Intelligent Planet Plan"
 VERSION=1.0.0
 # CORS
 CORS_ORIGINS=["http://localhost:3000", "http://localhost:8000"]
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -0,0 +1,19 @@
 FROM python:3.11-slim
 WORKDIR /app
 ENV PYTHONDONTWRITEBYTECODE=1
 ENV PYTHONUNBUFFERED=1
 RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 COPY . .
 EXPOSE 8000
 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/backend/app/init.py
+++ b/backend/app/init.py
--- a/backend/app/pycache/init.cpython-311.pyc
+++ b/backend/app/pycache/init.cpython-311.pyc
--- a/backend/app/pycache/main.cpython-311.pyc
+++ b/backend/app/pycache/main.cpython-311.pyc
--- a/backend/app/api/pycache/main.cpython-311.pyc
+++ b/backend/app/api/pycache/main.cpython-311.pyc
--- a/backend/app/api/main.py
+++ b/backend/app/api/main.py
@@ -0,0 +1,27 @@
 from fastapi import APIRouter
 from app.api.v1 import (
    auth,
    users,
    datasource_config,
    datasources,
    tasks,
    dashboard,
    websocket,
    alerts,
    settings,
    collected_data,
 )
 api_router = APIRouter()
 api_router.include_router(auth.router, prefix="/auth", tags=["auth"])
 api_router.include_router(users.router, prefix="/users", tags=["users"])
 api_router.include_router(
    datasource_config.router, prefix="/datasources", tags=["datasource-config"]
 )
 api_router.include_router(datasources.router, prefix="/datasources", tags=["datasources"])
 api_router.include_router(collected_data.router, prefix="/collected", tags=["collected-data"])
 api_router.include_router(tasks.router, prefix="/tasks", tags=["tasks"])
 api_router.include_router(dashboard.router, prefix="/dashboard", tags=["dashboard"])
 api_router.include_router(alerts.router, prefix="/alerts", tags=["alerts"])
 api_router.include_router(settings.router, prefix="/settings", tags=["settings"])
--- a/backend/app/api/v1/pycache/alerts.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/alerts.cpython-311.pyc
--- a/backend/app/api/v1/pycache/auth.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/auth.cpython-311.pyc
--- a/backend/app/api/v1/pycache/dashboard.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/dashboard.cpython-311.pyc
--- a/backend/app/api/v1/pycache/datasource_config.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/datasource_config.cpython-311.pyc
--- a/backend/app/api/v1/pycache/datasources.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/datasources.cpython-311.pyc
--- a/backend/app/api/v1/pycache/settings.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/settings.cpython-311.pyc
--- a/backend/app/api/v1/pycache/tasks.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/tasks.cpython-311.pyc
--- a/backend/app/api/v1/pycache/users.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/users.cpython-311.pyc
--- a/backend/app/api/v1/pycache/websocket.cpython-311.pyc
+++ b/backend/app/api/v1/pycache/websocket.cpython-311.pyc
--- a/backend/app/api/v1/alerts.py
+++ b/backend/app/api/v1/alerts.py
@@ -0,0 +1,124 @@
 from datetime import datetime
 from typing import Optional
 from fastapi import APIRouter, Depends
 from sqlalchemy import select, func, case
 from sqlalchemy.ext.asyncio import AsyncSession
 from app.db.session import get_db
 from app.models.user import User
 from app.core.security import get_current_user
 from app.models.alert import Alert, AlertSeverity, AlertStatus
 router = APIRouter()
@router.get("")
 async def list_alerts(
    severity: str = None,
    status: str = None,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    query = select(Alert)
    if severity:
        query = query.where(Alert.severity == AlertSeverity(severity))
    if status:
        query = query.where(Alert.status == AlertStatus(status))
    query = query.order_by(
        case(
            (Alert.severity == AlertSeverity.CRITICAL, 1),
            (Alert.severity == AlertSeverity.WARNING, 2),
            (Alert.severity == AlertSeverity.INFO, 3),
        ),
        Alert.created_at.desc(),
    )
    result = await db.execute(query)
    alerts = result.scalars().all()
    total_query = select(func.count(Alert.id))
    if severity:
        total_query = total_query.where(Alert.severity == AlertSeverity(severity))
    if status:
        total_query = total_query.where(Alert.status == AlertStatus(status))
    total_result = await db.execute(total_query)
    total = total_result.scalar()
    return {
        "total": total,
        "data": [alert.to_dict() for alert in alerts],
    }
@router.post("/{alert_id}/acknowledge")
 async def acknowledge_alert(
    alert_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    result = await db.execute(select(Alert).where(Alert.id == alert_id))
    alert = result.scalar_one_or_none()
    if not alert:
        return {"error": "Alert not found"}
    alert.status = AlertStatus.ACKNOWLEDGED
    alert.acknowledged_by = current_user.id
    alert.acknowledged_at = datetime.utcnow()
    await db.commit()
    return {"message": "Alert acknowledged", "alert": alert.to_dict()}
@router.post("/{alert_id}/resolve")
 async def resolve_alert(
    alert_id: int,
    resolution: str,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    result = await db.execute(select(Alert).where(Alert.id == alert_id))
    alert = result.scalar_one_or_none()
    if not alert:
        return {"error": "Alert not found"}
    alert.status = AlertStatus.RESOLVED
    alert.resolved_by = current_user.id
    alert.resolved_at = datetime.utcnow()
    alert.resolution_notes = resolution
    await db.commit()
    return {"message": "Alert resolved", "alert": alert.to_dict()}
@router.get("/stats")
 async def get_alert_stats(
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    critical_query = select(func.count(Alert.id)).where(
        Alert.severity == AlertSeverity.CRITICAL,
        Alert.status == AlertStatus.ACTIVE,
    )
    warning_query = select(func.count(Alert.id)).where(
        Alert.severity == AlertSeverity.WARNING,
        Alert.status == AlertStatus.ACTIVE,
    )
    info_query = select(func.count(Alert.id)).where(
        Alert.severity == AlertSeverity.INFO,
        Alert.status == AlertStatus.ACTIVE,
    )
    critical_result = await db.execute(critical_query)
    warning_result = await db.execute(warning_query)
    info_result = await db.execute(info_query)
    return {
        "critical": critical_result.scalar() or 0,
        "warning": warning_result.scalar() or 0,
        "info": info_result.scalar() or 0,
    }
--- a/backend/app/api/v1/auth.py
+++ b/backend/app/api/v1/auth.py
@@ -0,0 +1,108 @@
 from datetime import timedelta
 from fastapi import APIRouter, Depends, HTTPException, status
 from fastapi.security import OAuth2PasswordRequestForm
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import text
 from app.core.config import settings
 from app.core.security import (
    create_access_token,
    create_refresh_token,
    blacklist_token,
    get_current_user,
    verify_password,
 )
 from app.db.session import get_db
 from app.models.user import User
 from app.schemas.token import Token
 from app.schemas.user import UserCreate, UserResponse
 router = APIRouter()
@router.post("/login", response_model=Token)
 async def login(
    form_data: OAuth2PasswordRequestForm = Depends(),
    db: AsyncSession = Depends(get_db),
 ):
    result = await db.execute(
        text(
            "SELECT id, username, email, password_hash, role, is_active FROM users WHERE username = :username"
        ),
        {"username": form_data.username},
    )
    row = result.fetchone()
    if row is None:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid credentials",
        )
    user = User()
    user.id = row[0]
    user.username = row[1]
    user.email = row[2]
    user.password_hash = row[3]
    user.role = row[4]
    user.is_active = row[5]
    if not verify_password(form_data.password, user.password_hash):
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid credentials",
        )
    if not user.is_active:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="User is inactive",
        )
    access_token = create_access_token(data={"sub": user.id})
    refresh_token = create_refresh_token(data={"sub": user.id})
    return {
        "access_token": access_token,
        "token_type": "bearer",
        "expires_in": settings.ACCESS_TOKEN_EXPIRE_MINUTES * 60,
        "user": {
            "id": user.id,
            "username": user.username,
            "role": user.role,
        },
    }
@router.post("/refresh", response_model=Token)
 async def refresh_token(
    current_user: User = Depends(get_current_user),
 ):
    access_token = create_access_token(data={"sub": current_user.id})
    return {
        "access_token": access_token,
        "token_type": "bearer",
        "expires_in": settings.ACCESS_TOKEN_EXPIRE_MINUTES * 60,
        "user": {
            "id": current_user.id,
            "username": current_user.username,
            "role": current_user.role,
        },
    }
@router.post("/logout")
 async def logout():
    return {"message": "Successfully logged out"}
@router.get("/me", response_model=UserResponse)
 async def get_me(current_user: User = Depends(get_current_user)):
    return {
        "id": current_user.id,
        "username": current_user.username,
        "email": current_user.email,
        "role": current_user.role,
        "is_active": current_user.is_active,
        "created_at": current_user.created_at,
    }
--- a/backend/app/api/v1/collected_data.py
+++ b/backend/app/api/v1/collected_data.py
@@ -0,0 +1,431 @@
 from typing import Optional
 from fastapi import APIRouter, Depends, HTTPException, Query, status, Response
 from fastapi.responses import StreamingResponse
 from sqlalchemy import select, func, text
 from sqlalchemy.ext.asyncio import AsyncSession
 import json
 import csv
 import io
 from app.db.session import get_db
 from app.models.user import User
 from app.core.security import get_current_user
 from app.models.collected_data import CollectedData
 router = APIRouter()
@router.get("")
 async def list_collected_data(
    source: Optional[str] = Query(None, description="数据源过滤"),
    data_type: Optional[str] = Query(None, description="数据类型过滤"),
    country: Optional[str] = Query(None, description="国家过滤"),
    search: Optional[str] = Query(None, description="搜索名称"),
    page: int = Query(1, ge=1, description="页码"),
    page_size: int = Query(20, ge=1, le=100, description="每页数量"),
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """查询采集的数据列表"""
    # Build WHERE clause
    conditions = []
    params = {}
    if source:
        conditions.append("source = :source")
        params["source"] = source
    if data_type:
        conditions.append("data_type = :data_type")
        params["data_type"] = data_type
    if country:
        conditions.append("country = :country")
        params["country"] = country
    if search:
        conditions.append("(name ILIKE :search OR title ILIKE :search)")
        params["search"] = f"%{search}%"
    where_sql = " AND ".join(conditions) if conditions else "1=1"
    # Calculate offset
    offset = (page - 1) * page_size
    # Query total count
    count_query = text(f"SELECT COUNT(*) FROM collected_data WHERE {where_sql}")
    count_result = await db.execute(count_query, params)
    total = count_result.scalar()
    # Query data
    query = text(f"""
        SELECT id, source, source_id, data_type, name, title, description,
               country, city, latitude, longitude, value, unit,
               metadata, collected_at, reference_date, is_valid
        FROM collected_data
        WHERE {where_sql}
        ORDER BY collected_at DESC
        LIMIT :limit OFFSET :offset
    """)
    params["limit"] = page_size
    params["offset"] = offset
    result = await db.execute(query, params)
    rows = result.fetchall()
    data = []
    for row in rows:
        data.append(
            {
                "id": row[0],
                "source": row[1],
                "source_id": row[2],
                "data_type": row[3],
                "name": row[4],
                "title": row[5],
                "description": row[6],
                "country": row[7],
                "city": row[8],
                "latitude": row[9],
                "longitude": row[10],
                "value": row[11],
                "unit": row[12],
                "metadata": row[13],
                "collected_at": row[14].isoformat() if row[14] else None,
                "reference_date": row[15].isoformat() if row[15] else None,
                "is_valid": row[16],
            }
        )
    return {
        "total": total,
        "page": page,
        "page_size": page_size,
        "data": data,
    }
@router.get("/summary")
 async def get_data_summary(
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """获取数据汇总统计"""
    # By source and data_type
    result = await db.execute(
        text("""
        SELECT source, data_type, COUNT(*) as count
        FROM collected_data
        GROUP BY source, data_type
        ORDER BY source, data_type
    """)
    )
    rows = result.fetchall()
    by_source = {}
    total = 0
    for row in rows:
        source = row[0]
        data_type = row[1]
        count = row[2]
        if source not in by_source:
            by_source[source] = {}
        by_source[source][data_type] = count
        total += count
    # Total by source
    source_totals = await db.execute(
        text("""
        SELECT source, COUNT(*) as count
        FROM collected_data
        GROUP BY source
        ORDER BY count DESC
    """)
    )
    source_rows = source_totals.fetchall()
    return {
        "total_records": total,
        "by_source": by_source,
        "source_totals": [{"source": row[0], "count": row[1]} for row in source_rows],
    }
@router.get("/sources")
 async def get_data_sources(
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """获取所有数据源列表"""
    result = await db.execute(
        text("""
        SELECT DISTINCT source FROM collected_data ORDER BY source
    """)
    )
    rows = result.fetchall()
    return {
        "sources": [row[0] for row in rows],
    }
@router.get("/types")
 async def get_data_types(
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """获取所有数据类型列表"""
    result = await db.execute(
        text("""
        SELECT DISTINCT data_type FROM collected_data ORDER BY data_type
    """)
    )
    rows = result.fetchall()
    return {
        "data_types": [row[0] for row in rows],
    }
@router.get("/countries")
 async def get_countries(
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """获取所有国家列表"""
    result = await db.execute(
        text("""
        SELECT DISTINCT country FROM collected_data
        WHERE country IS NOT NULL AND country != ''
        ORDER BY country
    """)
    )
    rows = result.fetchall()
    return {
        "countries": [row[0] for row in rows],
    }
@router.get("/{data_id}")
 async def get_collected_data(
    data_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """获取单条采集数据详情"""
    result = await db.execute(
        text("""
            SELECT id, source, source_id, data_type, name, title, description,
                   country, city, latitude, longitude, value, unit,
                   metadata, collected_at, reference_date, is_valid
            FROM collected_data
            WHERE id = :id
        """),
        {"id": data_id},
    )
    row = result.fetchone()
    if not row:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="数据不存在",
        )
    return {
        "id": row[0],
        "source": row[1],
        "source_id": row[2],
        "data_type": row[3],
        "name": row[4],
        "title": row[5],
        "description": row[6],
        "country": row[7],
        "city": row[8],
        "latitude": row[9],
        "longitude": row[10],
        "value": row[11],
        "unit": row[12],
        "metadata": row[13],
        "collected_at": row[14].isoformat() if row[14] else None,
        "reference_date": row[15].isoformat() if row[15] else None,
        "is_valid": row[16],
    }
 def build_where_clause(
    source: Optional[str], data_type: Optional[str], country: Optional[str], search: Optional[str]
 ):
    """Build WHERE clause and params for queries"""
    conditions = []
    params = {}
    if source:
        conditions.append("source = :source")
        params["source"] = source
    if data_type:
        conditions.append("data_type = :data_type")
        params["data_type"] = data_type
    if country:
        conditions.append("country = :country")
        params["country"] = country
    if search:
        conditions.append("(name ILIKE :search OR title ILIKE :search)")
        params["search"] = f"%{search}%"
    where_sql = " AND ".join(conditions) if conditions else "1=1"
    return where_sql, params
@router.get("/export/json")
 async def export_json(
    source: Optional[str] = Query(None, description="数据源过滤"),
    data_type: Optional[str] = Query(None, description="数据类型过滤"),
    country: Optional[str] = Query(None, description="国家过滤"),
    search: Optional[str] = Query(None, description="搜索名称"),
    limit: int = Query(10000, ge=1, le=50000, description="最大导出数量"),
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """导出数据为 JSON 格式"""
    where_sql, params = build_where_clause(source, data_type, country, search)
    params["limit"] = limit
    query = text(f"""
        SELECT id, source, source_id, data_type, name, title, description,
               country, city, latitude, longitude, value, unit,
               metadata, collected_at, reference_date, is_valid
        FROM collected_data
        WHERE {where_sql}
        ORDER BY collected_at DESC
        LIMIT :limit
    """)
    result = await db.execute(query, params)
    rows = result.fetchall()
    data = []
    for row in rows:
        data.append(
            {
                "id": row[0],
                "source": row[1],
                "source_id": row[2],
                "data_type": row[3],
                "name": row[4],
                "title": row[5],
                "description": row[6],
                "country": row[7],
                "city": row[8],
                "latitude": row[9],
                "longitude": row[10],
                "value": row[11],
                "unit": row[12],
                "metadata": row[13],
                "collected_at": row[14].isoformat() if row[14] else None,
                "reference_date": row[15].isoformat() if row[15] else None,
                "is_valid": row[16],
            }
        )
    json_str = json.dumps({"data": data, "total": len(data)}, ensure_ascii=False, indent=2)
    return StreamingResponse(
        io.StringIO(json_str),
        media_type="application/json",
        headers={
            "Content-Disposition": f"attachment; filename=collected_data_{source or 'all'}.json"
        },
    )
@router.get("/export/csv")
 async def export_csv(
    source: Optional[str] = Query(None, description="数据源过滤"),
    data_type: Optional[str] = Query(None, description="数据类型过滤"),
    country: Optional[str] = Query(None, description="国家过滤"),
    search: Optional[str] = Query(None, description="搜索名称"),
    limit: int = Query(10000, ge=1, le=50000, description="最大导出数量"),
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """导出数据为 CSV 格式"""
    where_sql, params = build_where_clause(source, data_type, country, search)
    params["limit"] = limit
    query = text(f"""
        SELECT id, source, source_id, data_type, name, title, description,
               country, city, latitude, longitude, value, unit,
               metadata, collected_at, reference_date, is_valid
        FROM collected_data
        WHERE {where_sql}
        ORDER BY collected_at DESC
        LIMIT :limit
    """)
    result = await db.execute(query, params)
    rows = result.fetchall()
    output = io.StringIO()
    writer = csv.writer(output)
    # Write header
    writer.writerow(
        [
            "ID",
            "Source",
            "Source ID",
            "Type",
            "Name",
            "Title",
            "Description",
            "Country",
            "City",
            "Latitude",
            "Longitude",
            "Value",
            "Unit",
            "Metadata",
            "Collected At",
            "Reference Date",
            "Is Valid",
        ]
    )
    # Write data
    for row in rows:
        writer.writerow(
            [
                row[0],
                row[1],
                row[2],
                row[3],
                row[4],
                row[5],
                row[6],
                row[7],
                row[8],
                row[9],
                row[10],
                row[11],
                row[12],
                json.dumps(row[13]) if row[13] else "",
                row[14].isoformat() if row[14] else "",
                row[15].isoformat() if row[15] else "",
                row[16],
            ]
        )
    return StreamingResponse(
        io.StringIO(output.getvalue()),
        media_type="text/csv",
        headers={
            "Content-Disposition": f"attachment; filename=collected_data_{source or 'all'}.csv"
        },
    )
--- a/backend/app/api/v1/dashboard.py
+++ b/backend/app/api/v1/dashboard.py
@@ -0,0 +1,239 @@
 """Dashboard API with caching and optimizations"""
 from datetime import datetime, timedelta
 from fastapi import APIRouter, Depends
 from sqlalchemy import select, func, text
 from sqlalchemy.ext.asyncio import AsyncSession
 from app.db.session import get_db
 from app.models.user import User
 from app.models.datasource import DataSource
 from app.models.datasource_config import DataSourceConfig
 from app.models.alert import Alert, AlertSeverity
 from app.models.task import CollectionTask
 from app.core.security import get_current_user
 from app.core.cache import cache
 # Built-in collectors info (mirrored from datasources.py)
 COLLECTOR_INFO = {
    "top500": {
        "id": 1,
        "name": "TOP500 Supercomputers",
        "module": "L1",
        "priority": "P0",
        "frequency_hours": 4,
    },
    "epoch_ai_gpu": {
        "id": 2,
        "name": "Epoch AI GPU Clusters",
        "module": "L1",
        "priority": "P0",
        "frequency_hours": 6,
    },
    "huggingface_models": {
        "id": 3,
        "name": "HuggingFace Models",
        "module": "L2",
        "priority": "P1",
        "frequency_hours": 12,
    },
    "huggingface_datasets": {
        "id": 4,
        "name": "HuggingFace Datasets",
        "module": "L2",
        "priority": "P1",
        "frequency_hours": 12,
    },
    "huggingface_spaces": {
        "id": 5,
        "name": "HuggingFace Spaces",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 24,
    },
    "peeringdb_ixp": {
        "id": 6,
        "name": "PeeringDB IXP",
        "module": "L2",
        "priority": "P1",
        "frequency_hours": 24,
    },
    "peeringdb_network": {
        "id": 7,
        "name": "PeeringDB Networks",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 48,
    },
    "peeringdb_facility": {
        "id": 8,
        "name": "PeeringDB Facilities",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 48,
    },
    "telegeography_cables": {
        "id": 9,
        "name": "Submarine Cables",
        "module": "L2",
        "priority": "P1",
        "frequency_hours": 168,
    },
    "telegeography_landing": {
        "id": 10,
        "name": "Cable Landing Points",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 168,
    },
    "telegeography_systems": {
        "id": 11,
        "name": "Cable Systems",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 168,
    },
 }
 router = APIRouter()
@router.get("/stats")
 async def get_stats(
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Get dashboard statistics with caching"""
    cache_key = "dashboard:stats"
    cached_result = cache.get(cache_key)
    if cached_result:
        return cached_result
    today_start = datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
    # Count built-in collectors
    built_in_count = len(COLLECTOR_INFO)
    built_in_active = built_in_count  # Built-in are always "active" for counting purposes
    # Count custom configs from database
    result = await db.execute(select(func.count(DataSourceConfig.id)))
    custom_count = result.scalar() or 0
    result = await db.execute(
        select(func.count(DataSourceConfig.id)).where(DataSourceConfig.is_active == True)
    )
    custom_active = result.scalar() or 0
    # Total datasources
    total_datasources = built_in_count + custom_count
    active_datasources = built_in_active + custom_active
    # Tasks today (from database)
    result = await db.execute(
        select(func.count(CollectionTask.id)).where(CollectionTask.started_at >= today_start)
    )
    tasks_today = result.scalar() or 0
    result = await db.execute(
        select(func.count(CollectionTask.id)).where(
            CollectionTask.status == "success",
            CollectionTask.started_at >= today_start,
        )
    )
    success_tasks = result.scalar() or 0
    success_rate = (success_tasks / tasks_today * 100) if tasks_today > 0 else 0
    # Alerts
    result = await db.execute(
        select(func.count(Alert.id)).where(
            Alert.severity == AlertSeverity.CRITICAL,
            Alert.status == "active",
        )
    )
    critical_alerts = result.scalar() or 0
    result = await db.execute(
        select(func.count(Alert.id)).where(
            Alert.severity == AlertSeverity.WARNING,
            Alert.status == "active",
        )
    )
    warning_alerts = result.scalar() or 0
    result = await db.execute(
        select(func.count(Alert.id)).where(
            Alert.severity == AlertSeverity.INFO,
            Alert.status == "active",
        )
    )
    info_alerts = result.scalar() or 0
    response = {
        "total_datasources": total_datasources,
        "active_datasources": active_datasources,
        "tasks_today": tasks_today,
        "success_rate": round(success_rate, 1),
        "last_updated": datetime.utcnow().isoformat(),
        "alerts": {
            "critical": critical_alerts,
            "warning": warning_alerts,
            "info": info_alerts,
        },
    }
    cache.set(cache_key, response, expire_seconds=60)
    return response
@router.get("/summary")
 async def get_summary(
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Get dashboard summary by module with caching"""
    cache_key = "dashboard:summary"
    cached_result = cache.get(cache_key)
    if cached_result:
        return cached_result
    # Count by module for built-in collectors
    builtin_by_module = {}
    for name, info in COLLECTOR_INFO.items():
        module = info["module"]
        if module not in builtin_by_module:
            builtin_by_module[module] = {"datasources": 0, "sources": []}
        builtin_by_module[module]["datasources"] += 1
        builtin_by_module[module]["sources"].append(info["name"])
    # Count custom configs by module (default to L3 for custom)
    result = await db.execute(
        select(DataSourceConfig.source_type, func.count(DataSourceConfig.id).label("count"))
        .where(DataSourceConfig.is_active == True)
        .group_by(DataSourceConfig.source_type)
    )
    custom_rows = result.fetchall()
    for row in custom_rows:
        source_type = row.source_type
        module = "L3"  # Custom configs default to L3
        if module not in builtin_by_module:
            builtin_by_module[module] = {"datasources": 0, "sources": []}
        builtin_by_module[module]["datasources"] += row.count
        builtin_by_module[module]["sources"].append(f"自定义 ({source_type})")
    summary = {}
    for module, data in builtin_by_module.items():
        summary[module] = {
            "datasources": data["datasources"],
            "total_records": 0,  # Built-in don't track this in dashboard stats
            "last_updated": datetime.utcnow().isoformat(),
        }
    response = {"modules": summary, "last_updated": datetime.utcnow().isoformat()}
    cache.set(cache_key, response, expire_seconds=300)
    return response
--- a/backend/app/api/v1/datasource_config.py
+++ b/backend/app/api/v1/datasource_config.py
@@ -0,0 +1,309 @@
 """DataSourceConfig API for user-defined data sources"""
 from typing import Optional
 from datetime import datetime
 import base64
 from fastapi import APIRouter, Depends, HTTPException, status
 from sqlalchemy import select, func
 from sqlalchemy.ext.asyncio import AsyncSession
 from pydantic import BaseModel, Field
 import httpx
 from app.db.session import get_db
 from app.models.user import User
 from app.models.datasource_config import DataSourceConfig
 from app.core.security import get_current_user
 from app.core.cache import cache
 router = APIRouter()
 class DataSourceConfigCreate(BaseModel):
    name: str = Field(..., min_length=1, max_length=100)
    description: Optional[str] = None
    source_type: str = Field(..., description="http, api, database")
    endpoint: str = Field(..., max_length=500)
    auth_type: str = Field(default="none", description="none, bearer, api_key, basic")
    auth_config: dict = Field(default={})
    headers: dict = Field(default={})
    config: dict = Field(default={"timeout": 30, "retry": 3})
 class DataSourceConfigUpdate(BaseModel):
    name: Optional[str] = Field(None, min_length=1, max_length=100)
    description: Optional[str] = None
    source_type: Optional[str] = None
    endpoint: Optional[str] = Field(None, max_length=500)
    auth_type: Optional[str] = None
    auth_config: Optional[dict] = None
    headers: Optional[dict] = None
    config: Optional[dict] = None
    is_active: Optional[bool] = None
 class DataSourceConfigResponse(BaseModel):
    id: int
    name: str
    description: Optional[str]
    source_type: str
    endpoint: str
    auth_type: str
    headers: dict
    config: dict
    is_active: bool
    created_at: datetime
    updated_at: datetime
    class Config:
        from_attributes = True
 async def test_endpoint(
    endpoint: str,
    auth_type: str,
    auth_config: dict,
    headers: dict,
    config: dict,
 ) -> dict:
    """Test an endpoint connection"""
    timeout = config.get("timeout", 30)
    test_headers = headers.copy()
    # Add auth headers
    if auth_type == "bearer" and auth_config.get("token"):
        test_headers["Authorization"] = f"Bearer {auth_config['token']}"
    elif auth_type == "api_key" and auth_config.get("api_key"):
        key_name = auth_config.get("key_name", "X-API-Key")
        test_headers[key_name] = auth_config["api_key"]
    elif auth_type == "basic":
        username = auth_config.get("username", "")
        password = auth_config.get("password", "")
        credentials = f"{username}:{password}"
        encoded = base64.b64encode(credentials.encode()).decode()
        test_headers["Authorization"] = f"Basic {encoded}"
    async with httpx.AsyncClient(timeout=timeout) as client:
        response = await client.get(endpoint, headers=test_headers)
        response.raise_for_status()
        return {
            "status_code": response.status_code,
            "success": True,
            "response_time_ms": response.elapsed.total_seconds() * 1000,
            "data_preview": str(response.json()[:3])
            if response.headers.get("content-type", "").startswith("application/json")
            else response.text[:200],
        }
@router.get("/configs")
 async def list_configs(
    active_only: bool = False,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """List all user-defined data source configurations"""
    query = select(DataSourceConfig)
    if active_only:
        query = query.where(DataSourceConfig.is_active == True)
    query = query.order_by(DataSourceConfig.created_at.desc())
    result = await db.execute(query)
    configs = result.scalars().all()
    return {
        "total": len(configs),
        "data": [
            {
                "id": c.id,
                "name": c.name,
                "description": c.description,
                "source_type": c.source_type,
                "endpoint": c.endpoint,
                "auth_type": c.auth_type,
                "headers": c.headers,
                "config": c.config,
                "is_active": c.is_active,
                "created_at": c.created_at.isoformat() if c.created_at else None,
                "updated_at": c.updated_at.isoformat() if c.updated_at else None,
            }
            for c in configs
        ],
    }
@router.get("/configs/{config_id}")
 async def get_config(
    config_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Get a single data source configuration"""
    result = await db.execute(select(DataSourceConfig).where(DataSourceConfig.id == config_id))
    config = result.scalar_one_or_none()
    if not config:
        raise HTTPException(status_code=404, detail="Configuration not found")
    return {
        "id": config.id,
        "name": config.name,
        "description": config.description,
        "source_type": config.source_type,
        "endpoint": config.endpoint,
        "auth_type": config.auth_type,
        "auth_config": {},  # Don't return sensitive data
        "headers": config.headers,
        "config": config.config,
        "is_active": config.is_active,
        "created_at": config.created_at.isoformat() if config.created_at else None,
        "updated_at": config.updated_at.isoformat() if config.updated_at else None,
    }
@router.post("/configs")
 async def create_config(
    config_data: DataSourceConfigCreate,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Create a new data source configuration"""
    config = DataSourceConfig(
        name=config_data.name,
        description=config_data.description,
        source_type=config_data.source_type,
        endpoint=config_data.endpoint,
        auth_type=config_data.auth_type,
        auth_config=config_data.auth_config,
        headers=config_data.headers,
        config=config_data.config,
    )
    db.add(config)
    await db.commit()
    await db.refresh(config)
    cache.delete_pattern("datasource_configs:*")
    return {
        "id": config.id,
        "name": config.name,
        "message": "Configuration created successfully",
    }
@router.put("/configs/{config_id}")
 async def update_config(
    config_id: int,
    config_data: DataSourceConfigUpdate,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Update a data source configuration"""
    result = await db.execute(select(DataSourceConfig).where(DataSourceConfig.id == config_id))
    config = result.scalar_one_or_none()
    if not config:
        raise HTTPException(status_code=404, detail="Configuration not found")
    update_data = config_data.model_dump(exclude_unset=True)
    for field, value in update_data.items():
        setattr(config, field, value)
    await db.commit()
    await db.refresh(config)
    cache.delete_pattern("datasource_configs:*")
    return {
        "id": config.id,
        "name": config.name,
        "message": "Configuration updated successfully",
    }
@router.delete("/configs/{config_id}")
 async def delete_config(
    config_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Delete a data source configuration"""
    result = await db.execute(select(DataSourceConfig).where(DataSourceConfig.id == config_id))
    config = result.scalar_one_or_none()
    if not config:
        raise HTTPException(status_code=404, detail="Configuration not found")
    await db.delete(config)
    await db.commit()
    cache.delete_pattern("datasource_configs:*")
    return {"message": "Configuration deleted successfully"}
@router.post("/configs/{config_id}/test")
 async def test_config(
    config_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Test a data source configuration"""
    result = await db.execute(select(DataSourceConfig).where(DataSourceConfig.id == config_id))
    config = result.scalar_one_or_none()
    if not config:
        raise HTTPException(status_code=404, detail="Configuration not found")
    try:
        result = await test_endpoint(
            endpoint=config.endpoint,
            auth_type=config.auth_type,
            auth_config=config.auth_config or {},
            headers=config.headers or {},
            config=config.config or {},
        )
        return result
    except httpx.HTTPStatusError as e:
        return {
            "success": False,
            "error": f"HTTP Error: {e.response.status_code}",
            "message": str(e),
        }
    except Exception as e:
        return {
            "success": False,
            "error": "Connection failed",
            "message": str(e),
        }
@router.post("/configs/test")
 async def test_new_config(
    config_data: DataSourceConfigCreate,
    current_user: User = Depends(get_current_user),
 ):
    """Test a new data source configuration without saving"""
    try:
        result = await test_endpoint(
            endpoint=config_data.endpoint,
            auth_type=config_data.auth_type,
            auth_config=config_data.auth_config or {},
            headers=config_data.headers or {},
            config=config_data.config or {},
        )
        return result
    except httpx.HTTPStatusError as e:
        return {
            "success": False,
            "error": f"HTTP Error: {e.response.status_code}",
            "message": str(e),
        }
    except Exception as e:
        return {
            "success": False,
            "error": "Connection failed",
            "message": str(e),
        }
--- a/backend/app/api/v1/datasources.py
+++ b/backend/app/api/v1/datasources.py
@@ -0,0 +1,258 @@
 from typing import List, Optional
 from datetime import datetime
 from fastapi import APIRouter, Depends, HTTPException, status
 from sqlalchemy import select, func
 from sqlalchemy.ext.asyncio import AsyncSession
 from app.db.session import get_db
 from app.models.user import User
 from app.models.datasource import DataSource
 from app.core.security import get_current_user
 from app.services.collectors.registry import collector_registry
 router = APIRouter()
 COLLECTOR_INFO = {
    "top500": {
        "id": 1,
        "name": "TOP500 Supercomputers",
        "module": "L1",
        "priority": "P0",
        "frequency_hours": 4,
    },
    "epoch_ai_gpu": {
        "id": 2,
        "name": "Epoch AI GPU Clusters",
        "module": "L1",
        "priority": "P0",
        "frequency_hours": 6,
    },
    "huggingface_models": {
        "id": 3,
        "name": "HuggingFace Models",
        "module": "L2",
        "priority": "P1",
        "frequency_hours": 12,
    },
    "huggingface_datasets": {
        "id": 4,
        "name": "HuggingFace Datasets",
        "module": "L2",
        "priority": "P1",
        "frequency_hours": 12,
    },
    "huggingface_spaces": {
        "id": 5,
        "name": "HuggingFace Spaces",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 24,
    },
    "peeringdb_ixp": {
        "id": 6,
        "name": "PeeringDB IXP",
        "module": "L2",
        "priority": "P1",
        "frequency_hours": 24,
    },
    "peeringdb_network": {
        "id": 7,
        "name": "PeeringDB Networks",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 48,
    },
    "peeringdb_facility": {
        "id": 8,
        "name": "PeeringDB Facilities",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 48,
    },
    "telegeography_cables": {
        "id": 9,
        "name": "Submarine Cables",
        "module": "L2",
        "priority": "P1",
        "frequency_hours": 168,
    },
    "telegeography_landing": {
        "id": 10,
        "name": "Cable Landing Points",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 168,
    },
    "telegeography_systems": {
        "id": 11,
        "name": "Cable Systems",
        "module": "L2",
        "priority": "P2",
        "frequency_hours": 168,
    },
 }
 ID_TO_COLLECTOR = {info["id"]: name for name, info in COLLECTOR_INFO.items()}
 COLLECTOR_TO_ID = {name: info["id"] for name, info in COLLECTOR_INFO.items()}
 def get_collector_name(source_id: str) -> Optional[str]:
    try:
        numeric_id = int(source_id)
        if numeric_id in ID_TO_COLLECTOR:
            return ID_TO_COLLECTOR[numeric_id]
    except ValueError:
        pass
    if source_id in COLLECTOR_INFO:
        return source_id
    return None
@router.get("")
 async def list_datasources(
    module: Optional[str] = None,
    is_active: Optional[bool] = None,
    priority: Optional[str] = None,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    query = select(DataSource)
    filters = []
    if module:
        filters.append(DataSource.module == module)
    if is_active is not None:
        filters.append(DataSource.is_active == is_active)
    if priority:
        filters.append(DataSource.priority == priority)
    if filters:
        query = query.where(*filters)
    result = await db.execute(query)
    datasources = result.scalars().all()
    collector_list = []
    for name, info in COLLECTOR_INFO.items():
        is_active_status = collector_registry.is_active(name)
        collector_list.append(
            {
                "id": info["id"],
                "name": info["name"],
                "module": info["module"],
                "priority": info["priority"],
                "frequency": f"{info['frequency_hours']}h",
                "is_active": is_active_status,
                "collector_class": name,
            }
        )
    if module:
        collector_list = [c for c in collector_list if c["module"] == module]
    if priority:
        collector_list = [c for c in collector_list if c["priority"] == priority]
    return {
        "total": len(collector_list),
        "data": collector_list,
    }
@router.get("/{source_id}")
 async def get_datasource(
    source_id: str,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    collector_name = get_collector_name(source_id)
    if not collector_name:
        raise HTTPException(status_code=404, detail="Data source not found")
    info = COLLECTOR_INFO[collector_name]
    return {
        "id": info["id"],
        "name": info["name"],
        "module": info["module"],
        "priority": info["priority"],
        "frequency": f"{info['frequency_hours']}h",
        "collector_class": collector_name,
        "is_active": collector_registry.is_active(collector_name),
    }
@router.post("/{source_id}/enable")
 async def enable_datasource(
    source_id: str,
    current_user: User = Depends(get_current_user),
 ):
    collector_name = get_collector_name(source_id)
    if not collector_name:
        raise HTTPException(status_code=404, detail="Data source not found")
    collector_registry.set_active(collector_name, True)
    return {"status": "enabled", "source_id": source_id}
@router.post("/{source_id}/disable")
 async def disable_datasource(
    source_id: str,
    current_user: User = Depends(get_current_user),
 ):
    collector_name = get_collector_name(source_id)
    if not collector_name:
        raise HTTPException(status_code=404, detail="Data source not found")
    collector_registry.set_active(collector_name, False)
    return {"status": "disabled", "source_id": source_id}
@router.get("/{source_id}/stats")
 async def get_datasource_stats(
    source_id: str,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    collector_name = get_collector_name(source_id)
    if not collector_name:
        raise HTTPException(status_code=404, detail="Data source not found")
    info = COLLECTOR_INFO[collector_name]
    total_query = select(func.count(DataSource.id)).where(DataSource.source == info["name"])
    result = await db.execute(total_query)
    total = result.scalar() or 0
    return {
        "source_id": source_id,
        "collector_name": collector_name,
        "name": info["name"],
        "total_records": total,
        "last_updated": datetime.utcnow().isoformat(),
    }
@router.post("/{source_id}/trigger")
 async def trigger_datasource(
    source_id: str,
    current_user: User = Depends(get_current_user),
 ):
    collector_name = get_collector_name(source_id)
    if not collector_name:
        raise HTTPException(status_code=404, detail="Data source not found")
    from app.services.scheduler import run_collector_now
    if not collector_registry.is_active(collector_name):
        raise HTTPException(status_code=400, detail="Data source is disabled")
    success = run_collector_now(collector_name)
    if success:
        return {
            "status": "triggered",
            "source_id": source_id,
            "collector_name": collector_name,
            "message": f"Collector '{collector_name}' has been triggered",
        }
    else:
        raise HTTPException(
            status_code=500,
            detail=f"Failed to trigger collector '{collector_name}'",
        )
--- a/backend/app/api/v1/settings.py
+++ b/backend/app/api/v1/settings.py
@@ -0,0 +1,110 @@
 from typing import Optional
 from fastapi import APIRouter, Depends, HTTPException
 from pydantic import BaseModel, EmailStr
 from app.models.user import User
 from app.core.security import get_current_user
 router = APIRouter()
 default_settings = {
    "system": {
        "system_name": "智能星球",
        "refresh_interval": 60,
        "auto_refresh": True,
        "data_retention_days": 30,
        "max_concurrent_tasks": 5,
    },
    "notifications": {
        "email_enabled": False,
        "email_address": "",
        "critical_alerts": True,
        "warning_alerts": True,
        "daily_summary": False,
    },
    "security": {
        "session_timeout": 60,
        "max_login_attempts": 5,
        "password_policy": "medium",
    },
 }
 system_settings = default_settings["system"].copy()
 notification_settings = default_settings["notifications"].copy()
 security_settings = default_settings["security"].copy()
 class SystemSettingsUpdate(BaseModel):
    system_name: str = "智能星球"
    refresh_interval: int = 60
    auto_refresh: bool = True
    data_retention_days: int = 30
    max_concurrent_tasks: int = 5
 class NotificationSettingsUpdate(BaseModel):
    email_enabled: bool = False
    email_address: Optional[EmailStr] = None
    critical_alerts: bool = True
    warning_alerts: bool = True
    daily_summary: bool = False
 class SecuritySettingsUpdate(BaseModel):
    session_timeout: int = 60
    max_login_attempts: int = 5
    password_policy: str = "medium"
@router.get("/system")
 async def get_system_settings(current_user: User = Depends(get_current_user)):
    return {"system": system_settings}
@router.put("/system")
 async def update_system_settings(
    settings: SystemSettingsUpdate,
    current_user: User = Depends(get_current_user),
 ):
    global system_settings
    system_settings = settings.model_dump()
    return {"status": "updated", "system": system_settings}
@router.get("/notifications")
 async def get_notification_settings(current_user: User = Depends(get_current_user)):
    return {"notifications": notification_settings}
@router.put("/notifications")
 async def update_notification_settings(
    settings: NotificationSettingsUpdate,
    current_user: User = Depends(get_current_user),
 ):
    global notification_settings
    notification_settings = settings.model_dump()
    return {"status": "updated", "notifications": notification_settings}
@router.get("/security")
 async def get_security_settings(current_user: User = Depends(get_current_user)):
    return {"security": security_settings}
@router.put("/security")
 async def update_security_settings(
    settings: SecuritySettingsUpdate,
    current_user: User = Depends(get_current_user),
 ):
    global security_settings
    security_settings = settings.model_dump()
    return {"status": "updated", "security": security_settings}
@router.get("")
 async def get_all_settings(current_user: User = Depends(get_current_user)):
    return {
        "system": system_settings,
        "notifications": notification_settings,
        "security": security_settings,
    }
--- a/backend/app/api/v1/tasks.py
+++ b/backend/app/api/v1/tasks.py
@@ -0,0 +1,157 @@
 from datetime import datetime
 from typing import Optional
 from fastapi import APIRouter, Depends, HTTPException, status
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import text
 from app.db.session import get_db
 from app.models.user import User
 from app.core.security import get_current_user
 from app.services.collectors.registry import collector_registry
 router = APIRouter()
@router.get("")
 async def list_tasks(
    datasource_id: int = None,
    status: str = None,
    page: int = 1,
    page_size: int = 20,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    offset = (page - 1) * page_size
    query = """
        SELECT ct.id, ct.datasource_id, ds.name as datasource_name, ct.status,
               ct.started_at, ct.completed_at, ct.records_processed, ct.error_message
        FROM collection_tasks ct
        JOIN data_sources ds ON ct.datasource_id = ds.id
        WHERE 1=1
    """
    count_query = "SELECT COUNT(*) FROM collection_tasks ct WHERE 1=1"
    params = {}
    if datasource_id:
        query += " AND ct.datasource_id = :datasource_id"
        count_query += " WHERE ct.datasource_id = :datasource_id"
        params["datasource_id"] = datasource_id
    if status:
        query += " AND ct.status = :status"
        count_query += " AND ct.status = :status"
        params["status"] = status
    query += f" ORDER BY ct.created_at DESC LIMIT {page_size} OFFSET {offset}"
    result = await db.execute(text(query), params)
    tasks = result.fetchall()
    count_result = await db.execute(text(count_query), params)
    total = count_result.scalar()
    return {
        "total": total or 0,
        "page": page,
        "page_size": page_size,
        "data": [
            {
                "id": t[0],
                "datasource_id": t[1],
                "datasource_name": t[2],
                "status": t[3],
                "started_at": t[4].isoformat() if t[4] else None,
                "completed_at": t[5].isoformat() if t[5] else None,
                "records_processed": t[6],
                "error_message": t[7],
            }
            for t in tasks
        ],
    }
@router.get("/{task_id}")
 async def get_task(
    task_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    result = await db.execute(
        text("""
            SELECT ct.id, ct.datasource_id, ds.name as datasource_name, ct.status,
                   ct.started_at, ct.completed_at, ct.records_processed, ct.error_message
            FROM collection_tasks ct
            JOIN data_sources ds ON ct.datasource_id = ds.id
            WHERE ct.id = :id
        """),
        {"id": task_id},
    )
    task = result.fetchone()
    if not task:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Task not found",
        )
    return {
        "id": task[0],
        "datasource_id": task[1],
        "datasource_name": task[2],
        "status": task[3],
        "started_at": task[4].isoformat() if task[4] else None,
        "completed_at": task[5].isoformat() if task[5] else None,
        "records_processed": task[6],
        "error_message": task[7],
    }
@router.post("/datasources/{source_id}/trigger")
 async def trigger_collection(
    source_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    result = await db.execute(
        text("SELECT id, name, collector_class FROM data_sources WHERE id = :id"),
        {"id": source_id},
    )
    datasource = result.fetchone()
    if not datasource:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Data source not found",
        )
    collector_class_name = datasource[2]
    collector_name = collector_class_name.lower().replace("collector", "")
    collector = collector_registry.get(collector_name)
    if not collector:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=f"Collector {collector_name} not found",
        )
    result = await collector.run(db)
    await db.execute(
        text("""
            INSERT INTO collection_tasks (datasource_id, status, records_processed, error_message, started_at, completed_at, created_at)
            VALUES (:datasource_id, :status, :records_processed, :error_message, :started_at, :completed_at, NOW())
        """),
        {
            "datasource_id": source_id,
            "status": result.get("status", "unknown"),
            "records_processed": result.get("records_processed", 0),
            "error_message": result.get("error"),
            "started_at": datetime.utcnow(),
            "completed_at": datetime.utcnow(),
        },
    )
    return {
        "message": "Collection task executed",
        "result": result,
    }
--- a/backend/app/api/v1/users.py
+++ b/backend/app/api/v1/users.py
@@ -0,0 +1,263 @@
 from typing import List
 from fastapi import APIRouter, Depends, HTTPException, status
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import text
 from app.core.security import get_current_user, get_password_hash
 from app.db.session import get_db
 from app.models.user import User
 from app.schemas.user import UserCreate, UserResponse, UserUpdate
 router = APIRouter()
 def check_permission(current_user: User, required_roles: List[str]) -> bool:
    user_role_value = (
        current_user.role.value if hasattr(current_user.role, "value") else current_user.role
    )
    return user_role_value in required_roles
@router.get("", response_model=dict)
 async def list_users(
    page: int = 1,
    page_size: int = 20,
    role: str = None,
    is_active: bool = None,
    search: str = None,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    if not check_permission(current_user, ["super_admin", "admin"]):
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Insufficient permissions",
        )
    # Build WHERE clause
    where_clauses = []
    params = {}
    if role:
        where_clauses.append("role = :role")
        params["role"] = role
    if is_active is not None:
        where_clauses.append("is_active = :is_active")
        params["is_active"] = is_active
    if search:
        where_clauses.append("(username ILIKE :search OR email ILIKE :search)")
        params["search"] = f"%{search}%"
    where_sql = " AND ".join(where_clauses) if where_clauses else "1=1"
    offset = (page - 1) * page_size
    query = text(
        f"SELECT id, username, email, role, is_active, last_login_at, created_at FROM users WHERE {where_sql} ORDER BY created_at DESC LIMIT {page_size} OFFSET {offset}"
    )
    count_query = text(f"SELECT COUNT(*) FROM users WHERE {where_sql}")
    result = await db.execute(query, params)
    users = result.fetchall()
    count_result = await db.execute(count_query, params)
    total = count_result.scalar()
    return {
        "total": total,
        "page": page,
        "page_size": page_size,
        "data": [
            {
                "id": u[0],
                "username": u[1],
                "email": u[2],
                "role": u[3],
                "is_active": u[4],
                "last_login_at": u[5],
                "created_at": u[6],
            }
            for u in users
        ],
    }
@router.get("/{user_id}", response_model=dict)
 async def get_user(
    user_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    if not check_permission(current_user, ["super_admin", "admin"]) and current_user.id != user_id:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Insufficient permissions",
        )
    result = await db.execute(
        text(
            "SELECT id, username, email, role, is_active, last_login_at, created_at FROM users WHERE id = :id"
        ),
        {"id": user_id},
    )
    user = result.fetchone()
    if user is None:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="User not found",
        )
    return {
        "id": user[0],
        "username": user[1],
        "email": user[2],
        "role": user[3],
        "is_active": user[4],
        "last_login_at": user[5],
        "created_at": user[6],
    }
@router.post("", response_model=dict, status_code=status.HTTP_201_CREATED)
 async def create_user(
    user_data: UserCreate,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    if not check_permission(current_user, ["super_admin"]):
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Only super_admin can create users",
        )
    result = await db.execute(
        text("SELECT id FROM users WHERE username = :username OR email = :email"),
        {"username": user_data.username, "email": user_data.email},
    )
    if result.fetchone():
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="Username or email already exists",
        )
    hashed_password = get_password_hash(user_data.password)
    await db.execute(
        text("""INSERT INTO users (username, email, password_hash, role, is_active, created_at, updated_at)
           VALUES (:username, :email, :password_hash, :role, :is_active, NOW(), NOW())"""),
        {
            "username": user_data.username,
            "email": user_data.email,
            "password_hash": hashed_password,
            "role": user_data.role,
            "is_active": True,
        },
    )
    await db.commit()
    # Get the inserted user ID
    result = await db.execute(
        text("SELECT id FROM users WHERE username = :username"),
        {"username": user_data.username},
    )
    new_user = result.fetchone()
    if new_user is None:
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail="Failed to create user",
        )
    return {
        "id": new_user[0],
        "username": user_data.username,
        "email": user_data.email,
        "role": user_data.role,
        "is_active": True,
    }
@router.put("/{user_id}")
 async def update_user(
    user_id: int,
    user_data: UserUpdate,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    if not check_permission(current_user, ["super_admin", "admin"]) and current_user.id != user_id:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Insufficient permissions",
        )
    if not check_permission(current_user, ["super_admin"]) and user_data.role is not None:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Only super_admin can change user role",
        )
    result = await db.execute(
        text("SELECT id FROM users WHERE id = :id"),
        {"id": user_id},
    )
    if not result.fetchone():
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="User not found",
        )
    update_fields = []
    params = {"id": user_id}
    if user_data.email is not None:
        update_fields.append("email = :email")
        params["email"] = user_data.email
    if user_data.role is not None:
        update_fields.append("role = :role")
        params["role"] = user_data.role
    if user_data.is_active is not None:
        update_fields.append("is_active = :is_active")
        params["is_active"] = user_data.is_active
    if update_fields:
        update_fields.append("updated_at = NOW()")
        query = text(f"UPDATE users SET {', '.join(update_fields)} WHERE id = :id")
        await db.execute(query, params)
        await db.commit()
    return {"message": "User updated successfully"}
@router.delete("/{user_id}")
 async def delete_user(
    user_id: int,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    if not check_permission(current_user, ["super_admin"]):
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Only super_admin can delete users",
        )
    if current_user.id == user_id:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="Cannot delete yourself",
        )
    result = await db.execute(
        text("SELECT id FROM users WHERE id = :id"),
        {"id": user_id},
    )
    if not result.fetchone():
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="User not found",
        )
    await db.execute(
        text("DELETE FROM users WHERE id = :id"),
        {"id": user_id},
    )
    await db.commit()
    return {"message": "User deleted successfully"}
--- a/backend/app/api/v1/websocket.py
+++ b/backend/app/api/v1/websocket.py
@@ -0,0 +1,99 @@
 """WebSocket API endpoints"""
 import asyncio
 import json
 import logging
 from datetime import datetime
 from typing import Optional
 from fastapi import APIRouter, WebSocket, WebSocketDisconnect, Query
 from jose import jwt, JWTError
 from app.core.config import settings
 from app.core.websocket.manager import manager
 logger = logging.getLogger(__name__)
 router = APIRouter()
 async def authenticate_token(token: str) -> Optional[dict]:
    """Authenticate WebSocket connection via token"""
    try:
        payload = jwt.decode(token, settings.SECRET_KEY, algorithms=[settings.ALGORITHM])
        if payload.get("type") != "access":
            logger.warning(f"WebSocket auth failed: wrong token type")
            return None
        return payload
    except JWTError as e:
        logger.warning(f"WebSocket auth failed: {e}")
        return None
@router.websocket("/ws")
 async def websocket_endpoint(
    websocket: WebSocket,
    token: str = Query(...),
 ):
    """WebSocket endpoint for real-time data"""
    logger.info(f"WebSocket connection attempt with token: {token[:20]}...")
    payload = await authenticate_token(token)
    if payload is None:
        logger.warning("WebSocket authentication failed, closing connection")
        await websocket.close(code=4001)
        return
    user_id = str(payload.get("sub"))
    await manager.connect(websocket, user_id)
    try:
        await websocket.send_json(
            {
                "type": "connection_established",
                "data": {
                    "connection_id": f"conn_{user_id}",
                    "server_version": settings.VERSION,
                    "heartbeat_interval": 30,
                    "supported_channels": [
                        "gpu_clusters",
                        "submarine_cables",
                        "ixp_nodes",
                        "alerts",
                        "dashboard",
                    ],
                },
            }
        )
        while True:
            try:
                data = await asyncio.wait_for(websocket.receive_json(), timeout=30)
                if data.get("type") == "heartbeat":
                    await websocket.send_json(
                        {
                            "type": "heartbeat",
                            "data": {"action": "pong", "timestamp": datetime.utcnow().isoformat()},
                        }
                    )
                elif data.get("type") == "subscribe":
                    channels = data.get("data", {}).get("channels", [])
                    await websocket.send_json(
                        {
                            "type": "subscription_confirmed",
                            "data": {"action": "subscribe", "channels": channels},
                        }
                    )
                elif data.get("type") == "control_frame":
                    await websocket.send_json(
                        {"type": "control_acknowledged", "data": {"received": True}}
                    )
                else:
                    await websocket.send_json({"type": "ack", "data": {"received": True}})
            except asyncio.TimeoutError:
                await websocket.send_json({"type": "heartbeat", "data": {"action": "ping"}})
    except WebSocketDisconnect:
        pass
    finally:
        manager.disconnect(websocket, user_id)
--- a/backend/app/core/pycache/cache.cpython-311.pyc
+++ b/backend/app/core/pycache/cache.cpython-311.pyc
--- a/backend/app/core/pycache/config.cpython-311.pyc
+++ b/backend/app/core/pycache/config.cpython-311.pyc
--- a/backend/app/core/pycache/security.cpython-311.pyc
+++ b/backend/app/core/pycache/security.cpython-311.pyc
--- a/backend/app/core/cache.py
+++ b/backend/app/core/cache.py
@@ -0,0 +1,128 @@
 """Redis caching service"""
 import json
 import logging
 from datetime import timedelta
 from typing import Optional, Any
 import redis
 from app.core.config import settings
 logger = logging.getLogger(__name__)
 # Lazy Redis client initialization
 class _RedisClient:
    _client = None
    @classmethod
    def get_client(cls):
        if cls._client is None:
            # Parse REDIS_URL or use default
            redis_url = settings.REDIS_URL
            if redis_url.startswith("redis://"):
                cls._client = redis.from_url(redis_url, decode_responses=True)
            else:
                cls._client = redis.Redis(
                    host=settings.REDIS_SERVER,
                    port=settings.REDIS_PORT,
                    db=settings.REDIS_DB,
                    decode_responses=True,
                )
        return cls._client
 class CacheService:
    """Redis caching service with JSON serialization"""
    def __init__(self):
        self.client = _RedisClient.get_client()
    def get(self, key: str) -> Optional[Any]:
        """Get value from cache"""
        try:
            value = self.client.get(key)
            if value:
                return json.loads(value)
            return None
        except Exception as e:
            logger.warning(f"Cache get error: {e}")
            return None
    def set(
        self,
        key: str,
        value: Any,
        expire_seconds: int = 300,
    ) -> bool:
        """Set value in cache with expiration"""
        try:
            serialized = json.dumps(value, default=str)
            return self.client.setex(key, expire_seconds, serialized)
        except Exception as e:
            logger.warning(f"Cache set error: {e}")
            return False
    def delete(self, key: str) -> bool:
        """Delete key from cache"""
        try:
            return self.client.delete(key) > 0
        except Exception as e:
            logger.warning(f"Cache delete error: {e}")
            return False
    def delete_pattern(self, pattern: str) -> int:
        """Delete all keys matching pattern"""
        try:
            keys = self.client.keys(pattern)
            if keys:
                return self.client.delete(*keys)
            return 0
        except Exception as e:
            logger.warning(f"Cache delete_pattern error: {e}")
            return 0
    def get_or_set(
        self,
        key: str,
        fallback: callable,
        expire_seconds: int = 300,
    ) -> Optional[Any]:
        """Get value from cache or set it using fallback"""
        value = self.get(key)
        if value is not None:
            return value
        value = fallback()
        if value is not None:
            self.set(key, value, expire_seconds)
        return value
    def invalidate_pattern(self, pattern: str) -> int:
        """Invalidate all keys matching pattern"""
        return self.delete_pattern(pattern)
 cache = CacheService()
 def cached(expire_seconds: int = 300, key_prefix: str = ""):
    """Decorator for caching function results"""
    def decorator(func):
        async def wrapper(*args, **kwargs):
            cache_key = f"{key_prefix}:{func.__name__}:{args}:{kwargs}"
            cache_key = cache_key.replace(":", "_").replace(" ", "")
            cached_value = cache.get(cache_key)
            if cached_value is not None:
                return cached_value
            result = await func(*args, **kwargs)
            cache.set(cache_key, result, expire_seconds)
            return result
        return wrapper
    return decorator
--- a/backend/app/core/config.py
+++ b/backend/app/core/config.py
@@ -0,0 +1,46 @@
 from functools import lru_cache
 from pathlib import Path
 from typing import List
 import os
 from pydantic_settings import BaseSettings
 class Settings(BaseSettings):
    PROJECT_NAME: str = "Intelligent Planet Plan"
    VERSION: str = "1.0.0"
    API_V1_STR: str = "/api/v1"
    SECRET_KEY: str = "your-secret-key-change-in-production"
    ALGORITHM: str = "HS256"
    ACCESS_TOKEN_EXPIRE_MINUTES: int = 15
    REFRESH_TOKEN_EXPIRE_DAYS: int = 7
    POSTGRES_SERVER: str = "localhost"
    POSTGRES_USER: str = "postgres"
    POSTGRES_PASSWORD: str = "postgres"
    POSTGRES_DB: str = "planet_db"
    DATABASE_URL: str = f"postgresql+asyncpg://postgres:postgres@postgres:5432/planet_db"
    REDIS_SERVER: str = "localhost"
    REDIS_PORT: int = 6379
    REDIS_DB: int = 0
    CORS_ORIGINS: List[str] = ["http://localhost:3000", "http://localhost:8000"]
    @property
    def REDIS_URL(self) -> str:
        return os.getenv(
            "REDIS_URL", f"redis://{self.REDIS_SERVER}:{self.REDIS_PORT}/{self.REDIS_DB}"
        )
    class Config:
        env_file = ".env"
        case_sensitive = True
@lru_cache()
 def get_settings() -> Settings:
    return Settings()
 settings = get_settings()
--- a/backend/app/core/security.py
+++ b/backend/app/core/security.py
@@ -0,0 +1,162 @@
 from datetime import datetime, timedelta
 from typing import Optional
 import bcrypt
 import redis
 from fastapi import Depends, HTTPException, status
 from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
 from jose import JWTError, jwt
 from sqlalchemy import text
 from sqlalchemy.ext.asyncio import AsyncSession
 from app.core.config import settings
 from app.db.session import get_db
 from app.models.user import User
 oauth2_scheme = HTTPBearer()
 class _RedisClient:
    _client = None
    @classmethod
    def get_client(cls):
        if cls._client is None:
            redis_url = settings.REDIS_URL
            if redis_url.startswith("redis://"):
                cls._client = redis.from_url(redis_url, decode_responses=True)
            else:
                cls._client = redis.Redis(
                    host=settings.REDIS_SERVER,
                    port=settings.REDIS_PORT,
                    db=settings.REDIS_DB,
                    decode_responses=True,
                )
        return cls._client
 redis_client = _RedisClient.get_client()
 def verify_password(plain_password: str, hashed_password: str) -> bool:
    return bcrypt.checkpw(plain_password.encode(), hashed_password.encode())
 def get_password_hash(password: str) -> str:
    return bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
 def create_access_token(data: dict, expires_delta: Optional[timedelta] = None) -> str:
    to_encode = data.copy()
    if expires_delta:
        expire = datetime.utcnow() + expires_delta
    else:
        expire = datetime.utcnow() + timedelta(minutes=settings.ACCESS_TOKEN_EXPIRE_MINUTES)
    to_encode.update({"exp": expire, "type": "access"})
    if "sub" in to_encode:
        to_encode["sub"] = str(to_encode["sub"])
    return jwt.encode(to_encode, settings.SECRET_KEY, algorithm=settings.ALGORITHM)
 def create_refresh_token(data: dict) -> str:
    to_encode = data.copy()
    expire = datetime.utcnow() + timedelta(days=settings.REFRESH_TOKEN_EXPIRE_DAYS)
    to_encode.update({"exp": expire, "type": "refresh"})
    if "sub" in to_encode:
        to_encode["sub"] = str(to_encode["sub"])
    return jwt.encode(to_encode, settings.SECRET_KEY, algorithm=settings.ALGORITHM)
 def decode_token(token: str) -> Optional[dict]:
    try:
        payload = jwt.decode(token, settings.SECRET_KEY, algorithms=[settings.ALGORITHM])
        return payload
    except JWTError:
        return None
 async def get_current_user(
    credentials: HTTPAuthorizationCredentials = Depends(oauth2_scheme),
    db: AsyncSession = Depends(get_db),
 ) -> User:
    token = credentials.credentials
    if redis_client.sismember("blacklisted_tokens", token):
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Token has been revoked",
        )
    payload = decode_token(token)
    if payload is None or payload.get("type") != "access":
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid token",
        )
    user_id = payload.get("sub")
    if user_id is None:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid token",
        )
    result = await db.execute(
        text(
            "SELECT id, username, email, password_hash, role, is_active FROM users WHERE id = :id"
        ),
        {"id": int(user_id)},
    )
    row = result.fetchone()
    if row is None or not row[5]:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="User not found or inactive",
        )
    user = User()
    user.id = row[0]
    user.username = row[1]
    user.email = row[2]
    user.password_hash = row[3]
    user.role = row[4]
    user.is_active = row[5]
    return user
 async def get_current_user_refresh(
    credentials: HTTPAuthorizationCredentials = Depends(oauth2_scheme),
    db: AsyncSession = Depends(get_db),
 ) -> User:
    token = credentials.credentials
    payload = decode_token(token)
    if payload is None or payload.get("type") != "refresh":
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid refresh token",
        )
    user_id = payload.get("sub")
    if user_id is None:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid token",
        )
    result = await db.execute(
        text(
            "SELECT id, username, email, password_hash, role, is_active FROM users WHERE id = :id"
        ),
        {"id": int(user_id)},
    )
    row = result.fetchone()
    if row is None or not row[5]:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="User not found or inactive",
        )
    user = User()
    user.id = row[0]
    user.username = row[1]
    user.email = row[2]
    user.password_hash = row[3]
    user.role = row[4]
    user.is_active = row[5]
    return user
 def blacklist_token(token: str) -> None:
    redis_client.sadd("blacklisted_tokens", token)
--- a/backend/app/core/websocket/init.py
+++ b/backend/app/core/websocket/init.py
@@ -0,0 +1,4 @@
 """__init__.py for websocket package"""
 from app.core.websocket.manager import manager, ConnectionManager
 from app.core.websocket.broadcaster import broadcaster, DataBroadcaster
--- a/backend/app/core/websocket/pycache/init.cpython-311.pyc
+++ b/backend/app/core/websocket/pycache/init.cpython-311.pyc
--- a/backend/app/core/websocket/pycache/broadcaster.cpython-311.pyc
+++ b/backend/app/core/websocket/pycache/broadcaster.cpython-311.pyc
--- a/backend/app/core/websocket/pycache/manager.cpython-311.pyc
+++ b/backend/app/core/websocket/pycache/manager.cpython-311.pyc
--- a/backend/app/core/websocket/broadcaster.py
+++ b/backend/app/core/websocket/broadcaster.py
@@ -0,0 +1,93 @@
 """Data broadcaster for WebSocket connections"""
 import asyncio
 from datetime import datetime
 from typing import Dict, Any, Optional
 from app.core.websocket.manager import manager
 class DataBroadcaster:
    """Periodically broadcasts data to connected WebSocket clients"""
    def __init__(self):
        self.running = False
        self.tasks: Dict[str, asyncio.Task] = {}
    async def get_dashboard_stats(self) -> Dict[str, Any]:
        """Get dashboard statistics"""
        return {
            "total_datasources": 9,
            "active_datasources": 8,
            "tasks_today": 45,
            "success_rate": 97.8,
            "last_updated": datetime.utcnow().isoformat(),
            "alerts": {"critical": 0, "warning": 2, "info": 5},
        }
    async def broadcast_stats(self, interval: int = 5):
        """Broadcast dashboard stats periodically"""
        while self.running:
            try:
                stats = await self.get_dashboard_stats()
                await manager.broadcast(
                    {
                        "type": "data_frame",
                        "channel": "dashboard",
                        "timestamp": datetime.utcnow().isoformat(),
                        "payload": {"stats": stats},
                    },
                    channel="dashboard",
                )
            except Exception:
                pass
            await asyncio.sleep(interval)
    async def broadcast_alert(self, alert: Dict[str, Any]):
        """Broadcast an alert to all connected clients"""
        await manager.broadcast(
            {
                "type": "alert_notification",
                "timestamp": datetime.utcnow().isoformat(),
                "data": {"alert": alert},
            }
        )
    async def broadcast_gpu_update(self, data: Dict[str, Any]):
        """Broadcast GPU cluster update"""
        await manager.broadcast(
            {
                "type": "data_frame",
                "channel": "gpu_clusters",
                "timestamp": datetime.utcnow().isoformat(),
                "payload": data,
            }
        )
    async def broadcast_custom(self, channel: str, data: Dict[str, Any]):
        """Broadcast custom data to a specific channel"""
        await manager.broadcast(
            {
                "type": "data_frame",
                "channel": channel,
                "timestamp": datetime.utcnow().isoformat(),
                "payload": data,
            },
            channel=channel if channel in manager.active_connections else "all",
        )
    def start(self):
        """Start all broadcasters"""
        if not self.running:
            self.running = True
            self.tasks["dashboard"] = asyncio.create_task(self.broadcast_stats(5))
    def stop(self):
        """Stop all broadcasters"""
        self.running = False
        for task in self.tasks.values():
            task.cancel()
        self.tasks.clear()
 broadcaster = DataBroadcaster()
--- a/backend/app/core/websocket/manager.py
+++ b/backend/app/core/websocket/manager.py
@@ -0,0 +1,70 @@
 """WebSocket Connection Manager"""
 import json
 import asyncio
 from typing import Dict, Set, Optional
 from datetime import datetime
 from fastapi import WebSocket
 import redis.asyncio as redis
 from app.core.config import settings
 class ConnectionManager:
    """Manages WebSocket connections"""
    def __init__(self):
        self.active_connections: Dict[str, Set[WebSocket]] = {}  # user_id -> connections
        self.redis_client: Optional[redis.Redis] = None
    async def connect(self, websocket: WebSocket, user_id: str):
        await websocket.accept()
        if user_id not in self.active_connections:
            self.active_connections[user_id] = set()
        self.active_connections[user_id].add(websocket)
        if self.redis_client is None:
            redis_url = settings.REDIS_URL
            if redis_url.startswith("redis://"):
                self.redis_client = redis.from_url(redis_url, decode_responses=True)
            else:
                self.redis_client = redis.Redis(
                    host=settings.REDIS_SERVER,
                    port=settings.REDIS_PORT,
                    db=settings.REDIS_DB,
                    decode_responses=True,
                )
    def disconnect(self, websocket: WebSocket, user_id: str):
        if user_id in self.active_connections:
            self.active_connections[user_id].discard(websocket)
            if not self.active_connections[user_id]:
                del self.active_connections[user_id]
    async def send_personal_message(self, message: dict, user_id: str):
        if user_id in self.active_connections:
            for connection in self.active_connections[user_id]:
                try:
                    await connection.send_json(message)
                except Exception:
                    pass
    async def broadcast(self, message: dict, channel: str = "all"):
        if channel == "all":
            for user_id in self.active_connections:
                await self.send_personal_message(message, user_id)
        else:
            await self.send_personal_message(message, channel)
    async def close_all(self):
        for user_id in self.active_connections:
            for connection in self.active_connections[user_id]:
                await connection.close()
        self.active_connections.clear()
 manager = ConnectionManager()
 async def get_websocket_manager() -> ConnectionManager:
    return manager
--- a/backend/app/db/pycache/session.cpython-311.pyc
+++ b/backend/app/db/pycache/session.cpython-311.pyc
--- a/backend/app/db/session.py
+++ b/backend/app/db/session.py
@@ -0,0 +1,35 @@
 from typing import AsyncGenerator
 from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker
 from sqlalchemy.orm import declarative_base
 from app.core.config import settings
 engine = create_async_engine(
    settings.DATABASE_URL,
    echo=settings.DEBUG if hasattr(settings, "DEBUG") else False,
 )
 async_session_factory = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
 Base = declarative_base()
 async def get_db() -> AsyncGenerator[AsyncSession, None]:
    async with async_session_factory() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise
 async def init_db():
    import app.models.user  # noqa: F401
    import app.models.gpu_cluster  # noqa: F401
    import app.models.task  # noqa: F401
    import app.models.datasource  # noqa: F401
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)
--- a/backend/app/main.py
+++ b/backend/app/main.py
@@ -0,0 +1,86 @@
 from contextlib import asynccontextmanager
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.staticfiles import StaticFiles
 from starlette.middleware.base import BaseHTTPMiddleware
 from app.core.config import settings
 from app.core.websocket.broadcaster import broadcaster
 from app.db.session import init_db, async_session_factory
 from app.api.main import api_router
 from app.api.v1 import websocket
 from app.services.scheduler import start_scheduler, stop_scheduler
 class WebSocketCORSMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        if request.url.path.startswith("/ws") and request.method == "GET":
            response = await call_next(request)
            response.headers["Access-Control-Allow-Origin"] = "*"
            response.headers["Access-Control-Allow-Methods"] = "GET, OPTIONS"
            response.headers["Access-Control-Allow-Headers"] = "*"
            return response
        return await call_next(request)
@asynccontextmanager
 async def lifespan(app: FastAPI):
    await init_db()
    start_scheduler()
    broadcaster.start()
    yield
    broadcaster.stop()
    stop_scheduler()
 app = FastAPI(
    title=settings.PROJECT_NAME,
    version=settings.VERSION,
    description="智能星球计划 - 态势感知系统\n\n## 功能模块\n\n- **用户认证**: JWT-based authentication\n- **数据源管理**: 多源数据采集器管理\n- **任务调度**: 定时任务调度与监控\n- **实时更新**: WebSocket实时数据推送\n- **告警系统**: 多级告警管理\n\n## 数据层级\n\n- **L1**: 核心数据 (TOP500, Epoch AI GPU)\n- **L2**: 扩展数据 (HuggingFace, PeeringDB, 海缆)\n- **L3**: 分析数据\n- **L4**: 决策支持",
    lifespan=lifespan,
    docs_url=None,
    redoc_url="/docs",
    openapi_url="/openapi.json",
 )
 app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 app.add_middleware(WebSocketCORSMiddleware)
 app.include_router(api_router, prefix="/api/v1")
 app.include_router(websocket.router)
@app.get("/health")
 async def health_check():
    """健康检查端点"""
    return {
        "status": "healthy",
        "version": settings.VERSION,
    }
@app.get("/")
 async def root():
    """API根目录"""
    return {
        "name": settings.PROJECT_NAME,
        "version": settings.VERSION,
        "docs": "/docs",
        "redoc": "/redoc",
    }
@app.get("/api/v1/scheduler/jobs")
 async def get_scheduler_jobs():
    """获取调度任务列表"""
    from app.services.scheduler import get_scheduler_jobs
    return {"jobs": get_scheduler_jobs()}
--- a/backend/app/models/init.py
+++ b/backend/app/models/init.py
@@ -0,0 +1,15 @@
 from app.models.user import User
 from app.models.gpu_cluster import GPUCluster
 from app.models.task import CollectionTask
 from app.models.datasource import DataSource
 from app.models.alert import Alert, AlertSeverity, AlertStatus
 __all__ = [
    "User",
    "GPUCluster",
    "CollectionTask",
    "DataSource",
    "Alert",
    "AlertSeverity",
    "AlertStatus",
 ]
--- a/backend/app/models/pycache/init.cpython-311.pyc
+++ b/backend/app/models/pycache/init.cpython-311.pyc
--- a/backend/app/models/pycache/alert.cpython-311.pyc
+++ b/backend/app/models/pycache/alert.cpython-311.pyc
--- a/backend/app/models/pycache/collected_data.cpython-311.pyc
+++ b/backend/app/models/pycache/collected_data.cpython-311.pyc
--- a/backend/app/models/pycache/datasource.cpython-311.pyc
+++ b/backend/app/models/pycache/datasource.cpython-311.pyc
--- a/backend/app/models/pycache/datasource_config.cpython-311.pyc
+++ b/backend/app/models/pycache/datasource_config.cpython-311.pyc
--- a/backend/app/models/pycache/gpu_cluster.cpython-311.pyc
+++ b/backend/app/models/pycache/gpu_cluster.cpython-311.pyc
--- a/backend/app/models/pycache/task.cpython-311.pyc
+++ b/backend/app/models/pycache/task.cpython-311.pyc
--- a/backend/app/models/pycache/user.cpython-311.pyc
+++ b/backend/app/models/pycache/user.cpython-311.pyc
--- a/backend/app/models/alert.py
+++ b/backend/app/models/alert.py
@@ -0,0 +1,57 @@
 from datetime import datetime
 from enum import Enum
 from typing import Optional
 from sqlalchemy import Column, Integer, String, DateTime, Text, ForeignKey, Enum as SQLEnum
 from sqlalchemy.orm import relationship
 from app.db.session import Base
 class AlertSeverity(str, Enum):
    CRITICAL = "critical"
    WARNING = "warning"
    INFO = "info"
 class AlertStatus(str, Enum):
    ACTIVE = "active"
    ACKNOWLEDGED = "acknowledged"
    RESOLVED = "resolved"
 class Alert(Base):
    __tablename__ = "alerts"
    id = Column(Integer, primary_key=True, index=True)
    severity = Column(SQLEnum(AlertSeverity), default=AlertSeverity.WARNING)
    status = Column(SQLEnum(AlertStatus), default=AlertStatus.ACTIVE)
    datasource_id = Column(Integer, nullable=True, index=True)
    datasource_name = Column(String(255), nullable=True)
    message = Column(Text)
    alert_metadata = Column(Text, nullable=True)
    acknowledged_by = Column(Integer, nullable=True)
    resolved_by = Column(Integer, nullable=True)
    resolution_notes = Column(Text, nullable=True)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    acknowledged_at = Column(DateTime, nullable=True)
    resolved_at = Column(DateTime, nullable=True)
    def to_dict(self):
        return {
            "id": self.id,
            "severity": self.severity.value if self.severity else None,
            "status": self.status.value if self.status else None,
            "datasource_id": self.datasource_id,
            "datasource_name": self.datasource_name,
            "message": self.message,
            "alert_metadata": self.alert_metadata,
            "acknowledged_by": self.acknowledged_by,
            "resolved_by": self.resolved_by,
            "resolution_notes": self.resolution_notes,
            "created_at": self.created_at.isoformat() if self.created_at else None,
            "updated_at": self.updated_at.isoformat() if self.updated_at else None,
            "acknowledged_at": self.acknowledged_at.isoformat() if self.acknowledged_at else None,
            "resolved_at": self.resolved_at.isoformat() if self.resolved_at else None,
        }
--- a/backend/app/models/collected_data.py
+++ b/backend/app/models/collected_data.py
@@ -0,0 +1,80 @@
 """Collected Data model for storing data from all collectors"""
 from sqlalchemy import Column, DateTime, Integer, String, Text, JSON, Index
 from sqlalchemy.sql import func
 from app.db.session import Base
 class CollectedData(Base):
    """Generic model for storing collected data from all sources"""
    __tablename__ = "collected_data"
    id = Column(Integer, primary_key=True, autoincrement=True)
    source = Column(String(100), nullable=False, index=True)  # e.g., "top500", "huggingface_models"
    source_id = Column(String(100), index=True)  # Original ID from source, e.g., "rank_1"
    data_type = Column(
        String(50), nullable=False, index=True
    )  # e.g., "supercomputer", "model", "dataset"
    # Core data fields
    name = Column(String(500))
    title = Column(String(500))
    description = Column(Text)
    # Location data (for geo visualization)
    country = Column(String(100))
    city = Column(String(100))
    latitude = Column(String(50))
    longitude = Column(String(50))
    # Performance metrics
    value = Column(String(100))  # Generic value field (Rmax, Rpeak, etc.)
    unit = Column(String(20))
    # Additional metadata as JSON
    extra_data = Column(
        "metadata", JSON, default={}
    )  # Using 'extra_data' as attribute name but 'metadata' as column name
    # Timestamps
    collected_at = Column(DateTime(timezone=True), server_default=func.now(), index=True)
    reference_date = Column(DateTime(timezone=True))  # Data reference date (e.g., TOP500 list date)
    # Status
    is_valid = Column(Integer, default=1)  # 1=valid, 0=invalid
    # Indexes for common queries
    __table_args__ = (
        Index("idx_collected_data_source_collected", "source", "collected_at"),
        Index("idx_collected_data_source_type", "source", "data_type"),
    )
    def __repr__(self):
        return f"<CollectedData {self.id}: {self.source}/{self.data_type}>"
    def to_dict(self) -> dict:
        """Convert to dictionary"""
        return {
            "id": self.id,
            "source": self.source,
            "source_id": self.source_id,
            "data_type": self.data_type,
            "name": self.name,
            "title": self.title,
            "description": self.description,
            "country": self.country,
            "city": self.city,
            "latitude": self.latitude,
            "longitude": self.longitude,
            "value": self.value,
            "unit": self.unit,
            "metadata": self.extra_data,
            "collected_at": self.collected_at.isoformat()
            if self.collected_at is not None
            else None,
            "reference_date": self.reference_date.isoformat()
            if self.reference_date is not None
            else None,
        }
--- a/backend/app/models/datasource.py
+++ b/backend/app/models/datasource.py
@@ -0,0 +1,28 @@
 """Data Source model"""
 from sqlalchemy import Boolean, Column, DateTime, Integer, String, Text
 from sqlalchemy.sql import func
 from app.db.session import Base
 class DataSource(Base):
    __tablename__ = "data_sources"
    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String(100), nullable=False)
    source = Column(String(100), nullable=False)
    module = Column(String(10), nullable=False, index=True)  # L1, L2, L3, L4
    priority = Column(String(10), default="P1")  # P0, P1, P2
    frequency_minutes = Column(Integer, default=60)
    collector_class = Column(String(100), nullable=False)
    config = Column(Text, default="{}")  # JSON config
    is_active = Column(Boolean, default=True, index=True)
    last_run_at = Column(DateTime(timezone=True))
    last_status = Column(String(20))
    next_run_at = Column(DateTime(timezone=True))
    created_at = Column(DateTime(timezone=True), server_default=func.now())
    updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now())
    def __repr__(self):
        return f"<DataSource {self.id}: {self.name}>"
--- a/backend/app/models/datasource_config.py
+++ b/backend/app/models/datasource_config.py
@@ -0,0 +1,26 @@
 """User-defined Data Source Configuration model"""
 from sqlalchemy import Boolean, Column, DateTime, Integer, String, Text, JSON
 from sqlalchemy.sql import func
 from app.db.session import Base
 class DataSourceConfig(Base):
    __tablename__ = "datasource_configs"
    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String(100), nullable=False)
    description = Column(Text)
    source_type = Column(String(50), nullable=False)  # http, api, database, etc.
    endpoint = Column(String(500))
    auth_type = Column(String(20), default="none")  # none, bearer, api_key, basic
    auth_config = Column(JSON, default={})  # Encrypted credentials
    headers = Column(JSON, default={})
    config = Column(JSON, default={})  # Additional config like timeout, retry, etc.
    is_active = Column(Boolean, default=True)
    created_at = Column(DateTime(timezone=True), server_default=func.now())
    updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now())
    def __repr__(self):
        return f"<DataSourceConfig {self.id}: {self.name}>"
--- a/backend/app/models/gpu_cluster.py
+++ b/backend/app/models/gpu_cluster.py
@@ -0,0 +1,29 @@
 """GPU Cluster model for L1 data"""
 from sqlalchemy import Column, DateTime, Float, Integer, String, Text
 from sqlalchemy.sql import func
 from app.db.session import Base
 class GPUCluster(Base):
    __tablename__ = "gpu_clusters"
    id = Column(Integer, primary_key=True, autoincrement=True)
    time = Column(DateTime(timezone=True), nullable=False)
    cluster_id = Column(String(100), nullable=False, index=True)
    name = Column(String(200), nullable=False)
    country = Column(String(100))
    city = Column(String(100))
    latitude = Column(Float)
    longitude = Column(Float)
    organization = Column(String(200))
    gpu_count = Column(Integer)
    gpu_type = Column(String(100))
    total_flops = Column(Float)
    rank = Column(Integer)
    source = Column(String(50), nullable=False)
    created_at = Column(DateTime(timezone=True), server_default=func.now())
    def __repr__(self):
        return f"<GPUCluster {self.cluster_id}: {self.name}>"
--- a/backend/app/models/task.py
+++ b/backend/app/models/task.py
@@ -0,0 +1,22 @@
 """Collection Task model"""
 from sqlalchemy import Column, DateTime, Integer, String, Text
 from sqlalchemy.sql import func
 from app.db.session import Base
 class CollectionTask(Base):
    __tablename__ = "collection_tasks"
    id = Column(Integer, primary_key=True, autoincrement=True)
    datasource_id = Column(Integer, nullable=False, index=True)
    status = Column(String(20), nullable=False)  # pending, running, success, failed, cancelled
    started_at = Column(DateTime(timezone=True))
    completed_at = Column(DateTime(timezone=True))
    records_processed = Column(Integer, default=0)
    error_message = Column(Text)
    created_at = Column(DateTime(timezone=True), server_default=func.now())
    def __repr__(self):
        return f"<CollectionTask {self.id}: {self.status}>"
--- a/backend/app/models/user.py
+++ b/backend/app/models/user.py
@@ -0,0 +1,25 @@
 from sqlalchemy import Boolean, Column, Integer, String, DateTime
 from sqlalchemy.sql import func
 from app.db.session import Base
 class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True, index=True)
    username = Column(String(50), unique=True, index=True, nullable=False)
    email = Column(String(255), unique=True, index=True, nullable=False)
    password_hash = Column(String(255), nullable=False)
    role = Column(String(20), default="viewer")
    is_active = Column(Boolean, default=True)
    last_login_at = Column(DateTime(timezone=True))
    created_at = Column(DateTime(timezone=True), server_default=func.now())
    updated_at = Column(
        DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
    )
    def set_password(self, password: str):
        from app.core.security import get_password_hash
        self.password_hash = get_password_hash(password)
--- a/backend/app/schemas/pycache/token.cpython-311.pyc
+++ b/backend/app/schemas/pycache/token.cpython-311.pyc
--- a/backend/app/schemas/pycache/user.cpython-311.pyc
+++ b/backend/app/schemas/pycache/user.cpython-311.pyc
--- a/backend/app/schemas/token.py
+++ b/backend/app/schemas/token.py
@@ -0,0 +1,22 @@
 from datetime import datetime
 from typing import Optional
 from pydantic import BaseModel
 class Token(BaseModel):
    access_token: str
    token_type: str = "bearer"
    expires_in: int
    user: dict
 class TokenPayload(BaseModel):
    sub: int
    exp: datetime
    type: str
 class TokenRefresh(BaseModel):
    access_token: str
    expires_in: int
--- a/backend/app/schemas/user.py
+++ b/backend/app/schemas/user.py
@@ -0,0 +1,41 @@
 from datetime import datetime
 from typing import Optional
 from pydantic import BaseModel, EmailStr, Field
 class UserBase(BaseModel):
    username: str
    email: EmailStr
 class UserCreate(UserBase):
    password: str = Field(..., min_length=8)
    role: str = "viewer"
 class UserUpdate(BaseModel):
    email: Optional[EmailStr] = None
    role: Optional[str] = None
    is_active: Optional[bool] = None
 class UserInDB(UserBase):
    id: int
    role: str
    is_active: bool
    last_login_at: Optional[datetime]
    created_at: datetime
    class Config:
        from_attributes = True
 class UserResponse(UserBase):
    id: int
    role: str
    is_active: bool
    created_at: datetime
    class Config:
        from_attributes = True
--- a/backend/app/services/pycache/scheduler.cpython-311.pyc
+++ b/backend/app/services/pycache/scheduler.cpython-311.pyc
--- a/backend/app/services/collectors/init.py
+++ b/backend/app/services/collectors/init.py
@@ -0,0 +1,41 @@
 """__init__.py for collectors package"""
 from app.services.collectors.base import BaseCollector, HTTPCollector, IntervalCollector
 from app.services.collectors.registry import collector_registry, CollectorRegistry
 from app.services.collectors.top500 import TOP500Collector
 from app.services.collectors.epoch_ai import EpochAIGPUCollector
 from app.services.collectors.huggingface import (
    HuggingFaceModelCollector,
    HuggingFaceDatasetCollector,
    HuggingFaceSpacesCollector,
 )
 from app.services.collectors.peeringdb import (
    PeeringDBIXPCollector,
    PeeringDBNetworkCollector,
    PeeringDBFacilityCollector,
 )
 from app.services.collectors.telegeography import (
    TeleGeographyCableCollector,
    TeleGeographyLandingPointCollector,
    TeleGeographyCableSystemCollector,
 )
 from app.services.collectors.cloudflare import (
    CloudflareRadarDeviceCollector,
    CloudflareRadarTrafficCollector,
    CloudflareRadarTopASCollector,
 )
 collector_registry.register(TOP500Collector())
 collector_registry.register(EpochAIGPUCollector())
 collector_registry.register(HuggingFaceModelCollector())
 collector_registry.register(HuggingFaceDatasetCollector())
 collector_registry.register(HuggingFaceSpacesCollector())
 collector_registry.register(PeeringDBIXPCollector())
 collector_registry.register(PeeringDBNetworkCollector())
 collector_registry.register(PeeringDBFacilityCollector())
 collector_registry.register(TeleGeographyCableCollector())
 collector_registry.register(TeleGeographyLandingPointCollector())
 collector_registry.register(TeleGeographyCableSystemCollector())
 collector_registry.register(CloudflareRadarDeviceCollector())
 collector_registry.register(CloudflareRadarTrafficCollector())
 collector_registry.register(CloudflareRadarTopASCollector())
--- a/backend/app/services/collectors/pycache/init.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/init.cpython-311.pyc
--- a/backend/app/services/collectors/pycache/base.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/base.cpython-311.pyc
--- a/backend/app/services/collectors/pycache/cloudflare.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/cloudflare.cpython-311.pyc
--- a/backend/app/services/collectors/pycache/epoch_ai.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/epoch_ai.cpython-311.pyc
--- a/backend/app/services/collectors/pycache/huggingface.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/huggingface.cpython-311.pyc
--- a/backend/app/services/collectors/pycache/peeringdb.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/peeringdb.cpython-311.pyc
--- a/backend/app/services/collectors/pycache/registry.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/registry.cpython-311.pyc
--- a/backend/app/services/collectors/pycache/telegeography.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/telegeography.cpython-311.pyc
--- a/backend/app/services/collectors/pycache/top500.cpython-311.pyc
+++ b/backend/app/services/collectors/pycache/top500.cpython-311.pyc
--- a/backend/app/services/collectors/base.py
+++ b/backend/app/services/collectors/base.py
@@ -0,0 +1,179 @@
 """Base collector class for all data sources"""
 from abc import ABC, abstractmethod
 from typing import Dict, List, Any, Optional
 from datetime import datetime
 import httpx
 from sqlalchemy import text
 from sqlalchemy.ext.asyncio import AsyncSession
 from app.core.config import settings
 class BaseCollector(ABC):
    """Abstract base class for data collectors"""
    name: str = "base_collector"
    priority: str = "P1"
    module: str = "L1"
    frequency_hours: int = 4
    data_type: str = "generic"  # Override in subclass: "supercomputer", "model", "dataset", etc.
    @abstractmethod
    async def fetch(self) -> List[Dict[str, Any]]:
        """Fetch raw data from source"""
        pass
    def transform(self, raw_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Transform raw data to internal format (default: pass through)"""
        return raw_data
    async def run(self, db: AsyncSession) -> Dict[str, Any]:
        """Full pipeline: fetch -> transform -> save"""
        from app.services.collectors.registry import collector_registry
        from app.models.task import CollectionTask
        from app.models.collected_data import CollectedData
        start_time = datetime.utcnow()
        datasource_id = getattr(self, "_datasource_id", 1)  # Default to 1 for built-in collectors
        # Check if collector is active
        if not collector_registry.is_active(self.name):
            return {"status": "skipped", "reason": "Collector is disabled"}
        # Log task start
        task = CollectionTask(
            datasource_id=datasource_id,
            status="running",
            started_at=start_time,
        )
        db.add(task)
        await db.commit()
        task_id = task.id
        try:
            raw_data = await self.fetch()
            data = self.transform(raw_data)
            # Save data to database
            records_count = await self._save_data(db, data)
            # Log task success
            task.status = "success"
            task.records_processed = records_count
            task.completed_at = datetime.utcnow()
            await db.commit()
            return {
                "status": "success",
                "task_id": task_id,
                "records_processed": records_count,
                "execution_time_seconds": (datetime.utcnow() - start_time).total_seconds(),
            }
        except Exception as e:
            # Log task failure
            task.status = "failed"
            task.error_message = str(e)
            task.completed_at = datetime.utcnow()
            await db.commit()
            return {
                "status": "failed",
                "task_id": task_id,
                "error": str(e),
                "execution_time_seconds": (datetime.utcnow() - start_time).total_seconds(),
            }
    async def _save_data(self, db: AsyncSession, data: List[Dict[str, Any]]) -> int:
        """Save transformed data to database"""
        from app.models.collected_data import CollectedData
        if not data:
            return 0
        collected_at = datetime.utcnow()
        records_added = 0
        for item in data:
            # Create CollectedData entry
            record = CollectedData(
                source=self.name,
                source_id=item.get("source_id") or item.get("id"),
                data_type=self.data_type,
                name=item.get("name"),
                title=item.get("title"),
                description=item.get("description"),
                country=item.get("country"),
                city=item.get("city"),
                latitude=str(item.get("latitude", ""))
                if item.get("latitude") is not None
                else None,
                longitude=str(item.get("longitude", ""))
                if item.get("longitude") is not None
                else None,
                value=item.get("value"),
                unit=item.get("unit"),
                extra_data=item.get("metadata", {}),
                collected_at=collected_at,
                reference_date=datetime.fromisoformat(
                    item.get("reference_date").replace("Z", "+00:00")
                )
                if item.get("reference_date")
                else None,
                is_valid=1,
            )
            db.add(record)
            records_added += 1
        await db.commit()
        return records_added
    async def save(self, db: AsyncSession, data: List[Dict[str, Any]]) -> int:
        """Save data to database (legacy method, use _save_data instead)"""
        return await self._save_data(db, data)
 class HTTPCollector(BaseCollector):
    """Base class for HTTP API collectors"""
    base_url: str = ""
    headers: Dict[str, str] = {}
    async def fetch(self) -> List[Dict[str, Any]]:
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.get(self.base_url, headers=self.headers)
            response.raise_for_status()
            return self.parse_response(response.json())
    @abstractmethod
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        pass
 class IntervalCollector(BaseCollector):
    """Base class for collectors that run on intervals"""
    async def run(self, db: AsyncSession) -> Dict[str, Any]:
        return await super().run(db)
 async def log_task(
    db: AsyncSession,
    datasource_id: int,
    status: str,
    records_processed: int = 0,
    error_message: Optional[str] = None,
 ):
    """Log collection task to database"""
    from app.models.task import CollectionTask
    task = CollectionTask(
        datasource_id=datasource_id,
        status=status,
        records_processed=records_processed,
        error_message=error_message,
        started_at=datetime.utcnow(),
        completed_at=datetime.utcnow(),
    )
    db.add(task)
    await db.commit()
--- a/backend/app/services/collectors/cloudflare.py
+++ b/backend/app/services/collectors/cloudflare.py
@@ -0,0 +1,163 @@
 """Cloudflare Radar Traffic Collector
 Collects Internet traffic data from Cloudflare Radar API.
 https://developers.cloudflare.com/radar/
 Note: Radar API provides free access to global Internet traffic data.
 Some endpoints require authentication for higher rate limits.
 """
 import asyncio
 import os
 from typing import Dict, Any, List
 from datetime import datetime
 import httpx
 from app.services.collectors.base import HTTPCollector
 # Cloudflare API token (optional - for higher rate limits)
 CLOUDFLARE_API_TOKEN = os.environ.get("CLOUDFLARE_API_TOKEN", "")
 class CloudflareRadarDeviceCollector(HTTPCollector):
    """Collects device type distribution data (mobile vs desktop)"""
    name = "cloudflare_radar_device"
    priority = "P2"
    module = "L3"
    frequency_hours = 24
    data_type = "device_stats"
    base_url = "https://api.cloudflare.com/client/v4/radar/http/summary/device_type"
    def __init__(self):
        super().__init__()
        self.headers = {
            "User-Agent": "Planet-Intelligence-System/1.0 (Python/collector)",
            "Accept": "application/json",
        }
        if CLOUDFLARE_API_TOKEN:
            self.headers["Authorization"] = f"Bearer {CLOUDFLARE_API_TOKEN}"
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse Cloudflare Radar device type response"""
        data = []
        result = response.get("result", {})
        summary = result.get("summary_0", {})
        try:
            entry = {
                "source_id": "cloudflare_radar_device_global",
                "name": "Global Device Distribution",
                "country": "GLOBAL",
                "city": "",
                "latitude": 0.0,
                "longitude": 0.0,
                "metadata": {
                    "desktop_percent": float(summary.get("desktop", 0)),
                    "mobile_percent": float(summary.get("mobile", 0)),
                    "other_percent": float(summary.get("other", 0)),
                    "date_range": result.get("meta", {}).get("dateRange", {}),
                },
                "reference_date": datetime.utcnow().isoformat(),
            }
            data.append(entry)
        except (ValueError, TypeError, KeyError):
            pass
        return data
 class CloudflareRadarTrafficCollector(HTTPCollector):
    """Collects traffic volume trends"""
    name = "cloudflare_radar_traffic"
    priority = "P2"
    module = "L3"
    frequency_hours = 24
    data_type = "traffic_stats"
    base_url = "https://api.cloudflare.com/client/v4/radar/http/timeseries/requests"
    def __init__(self):
        super().__init__()
        self.headers = {
            "User-Agent": "Planet-Intelligence-System/1.0 (Python/collector)",
            "Accept": "application/json",
        }
        if CLOUDFLARE_API_TOKEN:
            self.headers["Authorization"] = f"Bearer {CLOUDFLARE_API_TOKEN}"
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse Cloudflare Radar traffic timeseries response"""
        data = []
        result = response.get("result", {})
        timeseries = result.get("requests_0", {}).get("timeseries", [])
        for item in timeseries:
            try:
                entry = {
                    "source_id": f"cloudflare_traffic_{item.get('datetime', '')}",
                    "name": f"Traffic {item.get('datetime', '')[:10]}",
                    "country": "GLOBAL",
                    "city": "",
                    "latitude": 0.0,
                    "longitude": 0.0,
                    "metadata": {
                        "datetime": item.get("datetime"),
                        "requests": item.get("requests"),
                        "visit_duration": item.get("visitDuration"),
                    },
                    "reference_date": item.get("datetime", datetime.utcnow().isoformat()),
                }
                data.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        return data
 class CloudflareRadarTopASCollector(HTTPCollector):
    """Collects top autonomous systems by traffic"""
    name = "cloudflare_radar_top_as"
    priority = "P2"
    module = "L2"
    frequency_hours = 24
    data_type = "as_stats"
    base_url = "https://api.cloudflare.com/client/v4/radar/http/top/locations"
    def __init__(self):
        super().__init__()
        self.headers = {
            "User-Agent": "Planet-Intelligence-System/1.0 (Python/collector)",
            "Accept": "application/json",
        }
        if CLOUDFLARE_API_TOKEN:
            self.headers["Authorization"] = f"Bearer {CLOUDFLARE_API_TOKEN}"
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse Cloudflare Radar top locations response"""
        data = []
        result = response.get("result", {})
        top_locations = result.get("top_locations_0", [])
        for idx, item in enumerate(top_locations):
            try:
                entry = {
                    "source_id": f"cloudflare_as_{item.get('rank', idx)}",
                    "name": item.get("location", {}).get("countryName", "Unknown"),
                    "country": item.get("location", {}).get("countryCode", "XX"),
                    "city": item.get("location", {}).get("cityName", ""),
                    "latitude": float(item.get("location", {}).get("latitude", 0)),
                    "longitude": float(item.get("location", {}).get("longitude", 0)),
                    "metadata": {
                        "rank": item.get("rank"),
                        "traffic_share": item.get("trafficShare"),
                        "country_code": item.get("location", {}).get("countryCode"),
                    },
                    "reference_date": datetime.utcnow().isoformat(),
                }
                data.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        return data
--- a/backend/app/services/collectors/epoch_ai.py
+++ b/backend/app/services/collectors/epoch_ai.py
@@ -0,0 +1,118 @@
 """Epoch AI GPU Clusters Collector
 Collects data from Epoch AI GPU clusters tracking.
 https://epoch.ai/data/gpu-clusters
 """
 import re
 from typing import Dict, Any, List
 from datetime import datetime
 from bs4 import BeautifulSoup
 import httpx
 from app.services.collectors.base import BaseCollector
 class EpochAIGPUCollector(BaseCollector):
    name = "epoch_ai_gpu"
    priority = "P0"
    module = "L1"
    frequency_hours = 6
    data_type = "gpu_cluster"
    async def fetch(self) -> List[Dict[str, Any]]:
        """Fetch Epoch AI GPU clusters data from webpage"""
        url = "https://epoch.ai/data/gpu-clusters"
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.get(url)
            response.raise_for_status()
            return self.parse_response(response.text)
    def parse_response(self, html: str) -> List[Dict[str, Any]]:
        """Parse Epoch AI webpage to extract GPU cluster data"""
        data = []
        soup = BeautifulSoup(html, "html.parser")
        # Try to find data table on the page
        tables = soup.find_all("table")
        for table in tables:
            rows = table.find_all("tr")
            for row in rows[1:]:  # Skip header
                cells = row.find_all(["td", "th"])
                if len(cells) >= 5:
                    try:
                        cluster_name = cells[0].get_text(strip=True)
                        if not cluster_name or cluster_name in ["Cluster", "System", "Name"]:
                            continue
                        location_cell = cells[1].get_text(strip=True) if len(cells) > 1 else ""
                        country, city = self._parse_location(location_cell)
                        perf_cell = cells[2].get_text(strip=True) if len(cells) > 2 else ""
                        entry = {
                            "source_id": f"epoch_{re.sub(r'[^a-zA-Z0-9]', '_', cluster_name.lower())}",
                            "name": cluster_name,
                            "country": country,
                            "city": city,
                            "latitude": "",
                            "longitude": "",
                            "value": self._parse_performance(perf_cell),
                            "unit": "TFlop/s",
                            "metadata": {
                                "raw_data": perf_cell,
                            },
                            "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
                        }
                        data.append(entry)
                    except (ValueError, IndexError, AttributeError):
                        continue
        # If no table found, return sample data
        if not data:
            data = self._get_sample_data()
        return data
    def _parse_location(self, location: str) -> tuple:
        """Parse location string into country and city"""
        if not location:
            return "", ""
        if "," in location:
            parts = location.rsplit(",", 1)
            city = parts[0].strip()
            country = parts[1].strip() if len(parts) > 1 else ""
            return country, city
        return location, ""
    def _parse_performance(self, perf: str) -> str:
        """Parse performance string to extract value"""
        if not perf:
            return "0"
        match = re.search(r"([\d,.]+)\s*(TFlop/s|PFlop/s|GFlop/s)?", perf, re.I)
        if match:
            return match.group(1).replace(",", "")
        match = re.search(r"([\d,.]+)", perf)
        if match:
            return match.group(1).replace(",", "")
        return "0"
    def _get_sample_data(self) -> List[Dict[str, Any]]:
        """Return sample data for testing when scraping fails"""
        return [
            {
                "source_id": "epoch_sample_1",
                "name": "Sample GPU Cluster",
                "country": "United States",
                "city": "San Francisco, CA",
                "latitude": "",
                "longitude": "",
                "value": "1000",
                "unit": "TFlop/s",
                "metadata": {
                    "note": "Sample data - Epoch AI page structure may vary",
                },
                "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
            },
        ]
--- a/backend/app/services/collectors/huggingface.py
+++ b/backend/app/services/collectors/huggingface.py
@@ -0,0 +1,136 @@
 """Hugging Face Model Ecosystem Collector
 Collects data from Hugging Face model hub.
 https://huggingface.co/models
 https://huggingface.co/datasets
 https://huggingface.co/spaces
 """
 from typing import Dict, Any, List
 from datetime import datetime
 from app.services.collectors.base import HTTPCollector
 class HuggingFaceModelCollector(HTTPCollector):
    name = "huggingface_models"
    priority = "P1"
    module = "L2"
    frequency_hours = 12
    data_type = "model"
    base_url = "https://huggingface.co/api/models"
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse Hugging Face models API response"""
        data = []
        models = (
            response
            if isinstance(response, list)
            else response.get("models", response.get("items", []))
        )
        for item in models[:100]:
            try:
                entry = {
                    "source_id": f"hf_model_{item.get('id', '')}",
                    "name": item.get("id", "Unknown"),
                    "description": (item.get("description", "") or "")[:500],
                    "metadata": {
                        "author": item.get("author"),
                        "likes": item.get("likes"),
                        "downloads": item.get("downloads"),
                        "language": item.get("language"),
                        "tags": (item.get("tags", []) or [])[:10],
                        "pipeline_tag": item.get("pipeline_tag"),
                        "library_name": item.get("library_name"),
                        "created_at": item.get("createdAt"),
                    },
                    "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
                }
                data.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        return data
 class HuggingFaceDatasetCollector(HTTPCollector):
    name = "huggingface_datasets"
    priority = "P1"
    module = "L2"
    frequency_hours = 12
    data_type = "dataset"
    base_url = "https://huggingface.co/api/datasets"
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse Hugging Face datasets API response"""
        data = []
        datasets = (
            response
            if isinstance(response, list)
            else response.get("datasets", response.get("items", []))
        )
        for item in datasets[:100]:
            try:
                entry = {
                    "source_id": f"hf_dataset_{item.get('id', '')}",
                    "name": item.get("id", "Unknown"),
                    "description": (item.get("description", "") or "")[:500],
                    "metadata": {
                        "author": item.get("author"),
                        "likes": item.get("likes"),
                        "downloads": item.get("downloads"),
                        "size": item.get("size"),
                        "language": item.get("language"),
                        "tags": (item.get("tags", []) or [])[:10],
                        "created_at": item.get("createdAt"),
                    },
                    "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
                }
                data.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        return data
 class HuggingFaceSpacesCollector(HTTPCollector):
    name = "huggingface_spaces"
    priority = "P2"
    module = "L2"
    frequency_hours = 24
    data_type = "space"
    base_url = "https://huggingface.co/api/spaces"
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse Hugging Face Spaces API response"""
        data = []
        spaces = (
            response
            if isinstance(response, list)
            else response.get("spaces", response.get("items", []))
        )
        for item in spaces[:100]:
            try:
                entry = {
                    "source_id": f"hf_space_{item.get('id', '')}",
                    "name": item.get("id", "Unknown"),
                    "description": (item.get("description", "") or "")[:500],
                    "metadata": {
                        "author": item.get("author"),
                        "likes": item.get("likes"),
                        "views": item.get("views"),
                        "sdk": item.get("sdk"),
                        "hardware": item.get("hardware"),
                        "tags": (item.get("tags", []) or [])[:10],
                        "created_at": item.get("createdAt"),
                    },
                    "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
                }
                data.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        return data
--- a/backend/app/services/collectors/peeringdb.py
+++ b/backend/app/services/collectors/peeringdb.py
@@ -0,0 +1,331 @@
 """PeeringDB IXP Nodes Collector
 Collects data from PeeringDB IXP directory.
 https://www.peeringdb.com
 Note: PeeringDB API has rate limits:
 - Anonymous: 20 requests/minute
 - Authenticated: 40 requests/minute (with API key)
 To get higher limits, set PEERINGDB_API_KEY environment variable.
 """
 import asyncio
 import os
 from typing import Dict, Any, List
 from datetime import datetime
 import httpx
 from app.services.collectors.base import HTTPCollector
 # PeeringDB API key - read from environment variable
 PEERINGDB_API_KEY = os.environ.get("PEERINGDB_API_KEY", "")
 class PeeringDBIXPCollector(HTTPCollector):
    name = "peeringdb_ixp"
    priority = "P1"
    module = "L2"
    frequency_hours = 24
    data_type = "ixp"
    base_url = "https://www.peeringdb.com/api/ix"
    def __init__(self):
        super().__init__()
        # Set headers with User-Agent
        self.headers = {
            "User-Agent": "Planet-Intelligence-System/1.0 (Python/collector)",
            "Accept": "application/json",
        }
        # API key is added to URL as query parameter
        if PEERINGDB_API_KEY:
            self.base_url = f"{self.base_url}?key={PEERINGDB_API_KEY}"
    async def fetch_with_retry(
        self, max_retries: int = 3, base_delay: float = 2.0
    ) -> Dict[str, Any]:
        """Fetch data with exponential backoff for rate limiting"""
        last_error = None
        for attempt in range(max_retries):
            try:
                async with httpx.AsyncClient(timeout=60.0) as client:
                    response = await client.get(self.base_url, headers=self.headers)
                    if response.status_code == 429:
                        # Rate limited - wait and retry with exponential backoff
                        delay = base_delay * (2**attempt)
                        print(f"PeeringDB rate limited, waiting {delay}s before retry...")
                        await asyncio.sleep(delay)
                        last_error = "Rate limited"
                        continue
                    response.raise_for_status()
                    return response.json()
            except httpx.HTTPStatusError as e:
                if e.response.status_code == 429:
                    delay = base_delay * (2**attempt)
                    print(f"PeeringDB rate limited, waiting {delay}s before retry...")
                    await asyncio.sleep(delay)
                    last_error = "Rate limited"
                    continue
                raise
        print(f"Warning: PeeringDB collection failed after {max_retries} retries: {last_error}")
        return {}
    async def collect(self) -> List[Dict[str, Any]]:
        """Collect IXP data from PeeringDB with rate limit handling"""
        response_data = await self.fetch_with_retry()
        if not response_data:
            return []
        return self.parse_response(response_data)
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse PeeringDB IXP API response"""
        data = []
        ixps = response.get("data", response.get("ixps", []))
        for item in ixps:
            try:
                entry = {
                    "source_id": f"peeringdb_ixp_{item.get('id', '')}",
                    "name": item.get("name", "Unknown"),
                    "country": item.get("country", "Unknown"),
                    "city": item.get("city", ""),
                    "latitude": self._parse_coordinate(item.get("latitude")),
                    "longitude": self._parse_coordinate(item.get("longitude")),
                    "metadata": {
                        "org_name": item.get("org_name"),
                        "url": item.get("url"),
                        "tech_email": item.get("tech_email"),
                        "tech_phone": item.get("tech_phone"),
                        "network_count": len(item.get("net_set", [])),
                        "created": item.get("created"),
                        "updated": item.get("updated"),
                    },
                    "reference_date": datetime.utcnow().isoformat(),
                }
                data.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        return data
    def _parse_coordinate(self, value: Any) -> float:
        if value is None:
            return 0.0
        if isinstance(value, (int, float)):
            return float(value)
        if isinstance(value, str):
            try:
                return float(value)
            except ValueError:
                return 0.0
        return 0.0
 class PeeringDBNetworkCollector(HTTPCollector):
    name = "peeringdb_network"
    priority = "P2"
    module = "L2"
    frequency_hours = 48
    data_type = "network"
    base_url = "https://www.peeringdb.com/api/net"
    def __init__(self):
        super().__init__()
        self.headers = {
            "User-Agent": "Planet-Intelligence-System/1.0 (Python/collector)",
            "Accept": "application/json",
        }
        if PEERINGDB_API_KEY:
            self.base_url = f"{self.base_url}?key={PEERINGDB_API_KEY}"
    async def fetch_with_retry(
        self, max_retries: int = 3, base_delay: float = 2.0
    ) -> Dict[str, Any]:
        """Fetch data with exponential backoff for rate limiting"""
        last_error = None
        for attempt in range(max_retries):
            try:
                async with httpx.AsyncClient(timeout=60.0) as client:
                    response = await client.get(self.base_url, headers=self.headers)
                    if response.status_code == 429:
                        delay = base_delay * (2**attempt)
                        print(f"PeeringDB rate limited, waiting {delay}s before retry...")
                        await asyncio.sleep(delay)
                        last_error = "Rate limited"
                        continue
                    response.raise_for_status()
                    return response.json()
            except httpx.HTTPStatusError as e:
                if e.response.status_code == 429:
                    delay = base_delay * (2**attempt)
                    print(f"PeeringDB rate limited, waiting {delay}s before retry...")
                    await asyncio.sleep(delay)
                    last_error = "Rate limited"
                    continue
                raise
        print(f"Warning: PeeringDB collection failed after {max_retries} retries: {last_error}")
        return {}
    async def collect(self) -> List[Dict[str, Any]]:
        """Collect Network data from PeeringDB with rate limit handling"""
        response_data = await self.fetch_with_retry()
        if not response_data:
            return []
        return self.parse_response(response_data)
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse PeeringDB Network API response"""
        data = []
        networks = response.get("data", response.get("networks", []))
        for item in networks:
            try:
                entry = {
                    "source_id": f"peeringdb_net_{item.get('id', '')}",
                    "name": item.get("name", "Unknown"),
                    "country": item.get("country", "Unknown"),
                    "city": item.get("city", ""),
                    "latitude": self._parse_coordinate(item.get("latitude")),
                    "longitude": self._parse_coordinate(item.get("longitude")),
                    "metadata": {
                        "asn": item.get("asn"),
                        "irr_as_set": item.get("irr_as_set"),
                        "url": item.get("url"),
                        "info_type": item.get("info_type"),
                        "info_traffic": item.get("info_traffic"),
                        "info_ratio": item.get("info_ratio"),
                        "ix_count": len(item.get("ix_set", [])),
                        "created": item.get("created"),
                        "updated": item.get("updated"),
                    },
                    "reference_date": datetime.utcnow().isoformat(),
                }
                data.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        return data
    def _parse_coordinate(self, value: Any) -> float:
        if value is None:
            return 0.0
        if isinstance(value, (int, float)):
            return float(value)
        if isinstance(value, str):
            try:
                return float(value)
            except ValueError:
                return 0.0
        return 0.0
 class PeeringDBFacilityCollector(HTTPCollector):
    name = "peeringdb_facility"
    priority = "P2"
    module = "L2"
    frequency_hours = 48
    data_type = "facility"
    base_url = "https://www.peeringdb.com/api/fac"
    def __init__(self):
        super().__init__()
        self.headers = {
            "User-Agent": "Planet-Intelligence-System/1.0 (Python/collector)",
            "Accept": "application/json",
        }
        if PEERINGDB_API_KEY:
            self.base_url = f"{self.base_url}?key={PEERINGDB_API_KEY}"
    async def fetch_with_retry(
        self, max_retries: int = 3, base_delay: float = 2.0
    ) -> Dict[str, Any]:
        """Fetch data with exponential backoff for rate limiting"""
        last_error = None
        for attempt in range(max_retries):
            try:
                async with httpx.AsyncClient(timeout=60.0) as client:
                    response = await client.get(self.base_url, headers=self.headers)
                    if response.status_code == 429:
                        delay = base_delay * (2**attempt)
                        print(f"PeeringDB rate limited, waiting {delay}s before retry...")
                        await asyncio.sleep(delay)
                        last_error = "Rate limited"
                        continue
                    response.raise_for_status()
                    return response.json()
            except httpx.HTTPStatusError as e:
                if e.response.status_code == 429:
                    delay = base_delay * (2**attempt)
                    print(f"PeeringDB rate limited, waiting {delay}s before retry...")
                    await asyncio.sleep(delay)
                    last_error = "Rate limited"
                    continue
                raise
        print(f"Warning: PeeringDB collection failed after {max_retries} retries: {last_error}")
        return {}
    async def collect(self) -> List[Dict[str, Any]]:
        """Collect Facility data from PeeringDB with rate limit handling"""
        response_data = await self.fetch_with_retry()
        if not response_data:
            return []
        return self.parse_response(response_data)
    def parse_response(self, response: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Parse PeeringDB Facility API response"""
        data = []
        facilities = response.get("data", response.get("facilities", []))
        for item in facilities:
            try:
                entry = {
                    "source_id": f"peeringdb_fac_{item.get('id', '')}",
                    "name": item.get("name", "Unknown"),
                    "country": item.get("country", "Unknown"),
                    "city": item.get("city", ""),
                    "latitude": self._parse_coordinate(item.get("latitude")),
                    "longitude": self._parse_coordinate(item.get("longitude")),
                    "metadata": {
                        "org_name": item.get("org_name"),
                        "address": item.get("address"),
                        "url": item.get("url"),
                        "rack_count": item.get("rack_count"),
                        "power": item.get("power"),
                        "network_count": len(item.get("net_set", [])),
                        "created": item.get("created"),
                        "updated": item.get("updated"),
                    },
                    "reference_date": datetime.utcnow().isoformat(),
                }
                data.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        return data
    def _parse_coordinate(self, value: Any) -> float:
        if value is None:
            return 0.0
        if isinstance(value, (int, float)):
            return float(value)
        if isinstance(value, str):
            try:
                return float(value)
            except ValueError:
                return 0.0
        return 0.0
--- a/backend/app/services/collectors/registry.py
+++ b/backend/app/services/collectors/registry.py
@@ -0,0 +1,43 @@
 """Collector registry for managing all data collectors"""
 from typing import Dict, Optional
 from app.services.collectors.base import BaseCollector
 class CollectorRegistry:
    """Registry for all data collectors"""
    _collectors: Dict[str, BaseCollector] = {}
    _active_collectors: set = set()
    @classmethod
    def register(cls, collector: BaseCollector):
        """Register a collector"""
        cls._collectors[collector.name] = collector
        cls._active_collectors.add(collector.name)
    @classmethod
    def get(cls, name: str) -> Optional[BaseCollector]:
        """Get a collector by name"""
        return cls._collectors.get(name)
    @classmethod
    def all(cls) -> Dict[str, BaseCollector]:
        """Get all collectors"""
        return cls._collectors.copy()
    @classmethod
    def is_active(cls, name: str) -> bool:
        """Check if a collector is active"""
        return name in cls._active_collectors
    @classmethod
    def set_active(cls, name: str, active: bool = True):
        """Set collector active status"""
        if active:
            cls._active_collectors.add(name)
        else:
            cls._active_collectors.discard(name)
 collector_registry = CollectorRegistry()
--- a/backend/app/services/collectors/telegeography.py
+++ b/backend/app/services/collectors/telegeography.py
@@ -0,0 +1,286 @@
 """TeleGeography Submarine Cables Collector
 Collects data from TeleGeography submarine cable database.
 Uses Wayback Machine as backup data source since live data requires JavaScript rendering.
 """
 import json
 import re
 from typing import Dict, Any, List
 from datetime import datetime
 from bs4 import BeautifulSoup
 import httpx
 from app.services.collectors.base import BaseCollector
 class TeleGeographyCableCollector(BaseCollector):
    name = "telegeography_cables"
    priority = "P1"
    module = "L2"
    frequency_hours = 168  # 7 days
    data_type = "submarine_cable"
    async def fetch(self) -> List[Dict[str, Any]]:
        """Fetch submarine cable data from Wayback Machine"""
        # Try multiple data sources
        sources = [
            # Wayback Machine archive of TeleGeography
            "https://web.archive.org/web/2024/https://www.submarinecablemap.com/api/v3/cable",
            # Alternative: Try scraping the page
            "https://www.submarinecablemap.com",
        ]
        for url in sources:
            try:
                async with httpx.AsyncClient(timeout=60.0, follow_redirects=True) as client:
                    response = await client.get(url)
                    response.raise_for_status()
                    # Check if response is JSON
                    content_type = response.headers.get("content-type", "")
                    if "application/json" in content_type or url.endswith(".json"):
                        return self.parse_response(response.json())
                    else:
                        # It's HTML, try to scrape
                        data = self.scrape_cables_from_html(response.text)
                        if data:
                            return data
            except Exception:
                continue
        # Fallback to sample data
        return self._get_sample_data()
    def scrape_cables_from_html(self, html: str) -> List[Dict[str, Any]]:
        """Try to extract cable data from HTML page"""
        data = []
        soup = BeautifulSoup(html, "html.parser")
        # Look for embedded JSON data in scripts
        scripts = soup.find_all("script")
        for script in scripts:
            text = script.string or ""
            if "cable" in text.lower() and ("{" in text or "[" in text):
                # Try to find JSON data
                match = re.search(r"\[.+\]", text, re.DOTALL)
                if match:
                    try:
                        potential_data = json.loads(match.group())
                        if isinstance(potential_data, list):
                            return potential_data
                    except:
                        pass
        return data
    def parse_response(self, data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Parse submarine cable data"""
        result = []
        if not isinstance(data, list):
            data = [data]
        for item in data:
            try:
                entry = {
                    "source_id": f"telegeo_cable_{item.get('id', item.get('cable_id', ''))}",
                    "name": item.get("name", item.get("cable_name", "Unknown")),
                    "country": "",
                    "city": "",
                    "latitude": "",
                    "longitude": "",
                    "value": str(item.get("length", item.get("length_km", 0))),
                    "unit": "km",
                    "metadata": {
                        "owner": item.get("owner"),
                        "operator": item.get("operator"),
                        "length_km": item.get("length", item.get("length_km")),
                        "rfs": item.get("rfs"),
                        "status": item.get("status", "active"),
                        "cable_type": item.get("type", "fiber optic"),
                        "capacity_tbps": item.get("capacity"),
                        "url": item.get("url"),
                    },
                    "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
                }
                result.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        if not result:
            result = self._get_sample_data()
        return result
    def _get_sample_data(self) -> List[Dict[str, Any]]:
        """Return sample submarine cable data"""
        return [
            {
                "source_id": "telegeo_sample_1",
                "name": "2Africa",
                "country": "",
                "city": "",
                "latitude": "",
                "longitude": "",
                "value": "45000",
                "unit": "km",
                "metadata": {
                    "note": "Sample data - TeleGeography requires browser/scraper for live data",
                    "owner": "Meta, Orange, Vodafone, etc.",
                    "status": "active",
                },
                "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
            },
            {
                "source_id": "telegeo_sample_2",
                "name": "Asia Connect Cable 1",
                "country": "",
                "city": "",
                "latitude": "",
                "longitude": "",
                "value": "12000",
                "unit": "km",
                "metadata": {
                    "note": "Sample data",
                    "owner": "Alibaba, NEC",
                    "status": "planned",
                },
                "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
            },
        ]
 class TeleGeographyLandingPointCollector(BaseCollector):
    name = "telegeography_landing"
    priority = "P2"
    module = "L2"
    frequency_hours = 168
    data_type = "landing_point"
    async def fetch(self) -> List[Dict[str, Any]]:
        """Fetch landing point data from GitHub mirror"""
        url = "https://raw.githubusercontent.com/lintaojlu/submarine_cable_information/main/landing_point.json"
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.get(url)
            response.raise_for_status()
            return self.parse_response(response.json())
    def parse_response(self, data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Parse landing point data"""
        result = []
        for item in data:
            try:
                entry = {
                    "source_id": f"telegeo_lp_{item.get('id', '')}",
                    "name": item.get("name", "Unknown"),
                    "country": item.get("country", "Unknown"),
                    "city": item.get("city", item.get("name", "")),
                    "latitude": str(item.get("latitude", "")),
                    "longitude": str(item.get("longitude", "")),
                    "value": "",
                    "unit": "",
                    "metadata": {
                        "cable_count": len(item.get("cables", [])),
                        "url": item.get("url"),
                    },
                    "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
                }
                result.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        if not result:
            result = self._get_sample_data()
        return result
    def _get_sample_data(self) -> List[Dict[str, Any]]:
        """Return sample landing point data"""
        return [
            {
                "source_id": "telegeo_lp_sample_1",
                "name": "Sample Landing Point",
                "country": "United States",
                "city": "Los Angeles, CA",
                "latitude": "34.0522",
                "longitude": "-118.2437",
                "value": "",
                "unit": "",
                "metadata": {"note": "Sample data"},
                "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
            },
        ]
 class TeleGeographyCableSystemCollector(BaseCollector):
    name = "telegeography_systems"
    priority = "P2"
    module = "L2"
    frequency_hours = 168
    data_type = "cable_system"
    async def fetch(self) -> List[Dict[str, Any]]:
        """Fetch cable system data"""
        url = "https://raw.githubusercontent.com/lintaojlu/submarine_cable_information/main/cable.json"
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.get(url)
            response.raise_for_status()
            return self.parse_response(response.json())
    def parse_response(self, data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Parse cable system data"""
        result = []
        for item in data:
            try:
                entry = {
                    "source_id": f"telegeo_sys_{item.get('id', item.get('cable_id', ''))}",
                    "name": item.get("name", item.get("cable_name", "Unknown")),
                    "country": "",
                    "city": "",
                    "latitude": "",
                    "longitude": "",
                    "value": str(item.get("length", 0)),
                    "unit": "km",
                    "metadata": {
                        "owner": item.get("owner"),
                        "operator": item.get("operator"),
                        "route": item.get("route"),
                        "countries": item.get("countries", []),
                        "length_km": item.get("length"),
                        "rfs": item.get("rfs"),
                        "status": item.get("status", "active"),
                        "investment": item.get("investment"),
                        "url": item.get("url"),
                    },
                    "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
                }
                result.append(entry)
            except (ValueError, TypeError, KeyError):
                continue
        if not result:
            result = self._get_sample_data()
        return result
    def _get_sample_data(self) -> List[Dict[str, Any]]:
        """Return sample cable system data"""
        return [
            {
                "source_id": "telegeo_sys_sample_1",
                "name": "Sample Cable System",
                "country": "",
                "city": "",
                "latitude": "",
                "longitude": "",
                "value": "5000",
                "unit": "km",
                "metadata": {"note": "Sample data"},
                "reference_date": datetime.utcnow().strftime("%Y-%m-%d"),
            },
        ]
--- a/backend/app/services/collectors/top500.py
+++ b/backend/app/services/collectors/top500.py
@@ -0,0 +1,230 @@
 """TOP500 Supercomputer Collector
 Collects data from TOP500 supercomputer rankings.
 https://top500.org/lists/top500/
 """
 import re
 from typing import Dict, Any, List
 from datetime import datetime
 from bs4 import BeautifulSoup
 import httpx
 from app.services.collectors.base import BaseCollector
 class TOP500Collector(BaseCollector):
    name = "top500"
    priority = "P0"
    module = "L1"
    frequency_hours = 4
    data_type = "supercomputer"
    async def fetch(self) -> List[Dict[str, Any]]:
        """Fetch TOP500 data from website (scraping)"""
        # Get the latest list page
        url = "https://top500.org/lists/top500/list/2025/11/"
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.get(url)
            response.raise_for_status()
            return self.parse_response(response.text)
    def parse_response(self, html: str) -> List[Dict[str, Any]]:
        """Parse TOP500 HTML response"""
        data = []
        soup = BeautifulSoup(html, "html.parser")
        # Find the table with TOP500 data
        table = soup.find("table", {"class": "top500-table"})
        if not table:
            # Try alternative table selector
            table = soup.find("table", {"id": "top500"})
        if not table:
            # Try to find any table with rank data
            tables = soup.find_all("table")
            for t in tables:
                if t.find(string=re.compile(r"Rank.*System.*Cores.*Rmax", re.I)):
                    table = t
                    break
        if not table:
            # Fallback: try to extract data from any table
            tables = soup.find_all("table")
            if tables:
                table = tables[0]
        if table:
            rows = table.find_all("tr")
            for row in rows[1:]:  # Skip header row
                cells = row.find_all(["td", "th"])
                if len(cells) >= 6:
                    try:
                        # Parse the row data
                        rank_text = cells[0].get_text(strip=True)
                        if not rank_text or not rank_text.isdigit():
                            continue
                        rank = int(rank_text)
                        # System name (may contain link)
                        system_cell = cells[1]
                        system_name = system_cell.get_text(strip=True)
                        # Try to get full name from link title or data attribute
                        link = system_cell.find("a")
                        if link and link.get("title"):
                            system_name = link.get("title")
                        # Country
                        country_cell = cells[2]
                        country = country_cell.get_text(strip=True)
                        # Try to get country from data attribute or image alt
                        img = country_cell.find("img")
                        if img and img.get("alt"):
                            country = img.get("alt")
                        # Extract location (city)
                        city = ""
                        location_text = country_cell.get_text(strip=True)
                        if "(" in location_text and ")" in location_text:
                            city = location_text.split("(")[0].strip()
                        # Cores
                        cores = cells[3].get_text(strip=True).replace(",", "")
                        # Rmax
                        rmax_text = cells[4].get_text(strip=True)
                        rmax = self._parse_performance(rmax_text)
                        # Rpeak
                        rpeak_text = cells[5].get_text(strip=True)
                        rpeak = self._parse_performance(rpeak_text)
                        # Power (optional)
                        power = ""
                        if len(cells) >= 7:
                            power = cells[6].get_text(strip=True)
                        entry = {
                            "source_id": f"top500_{rank}",
                            "name": system_name,
                            "country": country,
                            "city": city,
                            "latitude": 0.0,
                            "longitude": 0.0,
                            "value": str(rmax),
                            "unit": "PFlop/s",
                            "metadata": {
                                "rank": rank,
                                "r_peak": rpeak,
                                "power": power,
                                "cores": cores,
                            },
                            "reference_date": "2025-11-01",
                        }
                        data.append(entry)
                    except (ValueError, IndexError, AttributeError) as e:
                        continue
        # If scraping failed, return sample data for testing
        if not data:
            data = self._get_sample_data()
        return data
    def _parse_coordinate(self, value: Any) -> float:
        """Parse coordinate value"""
        if isinstance(value, (int, float)):
            return float(value)
        if isinstance(value, str):
            try:
                return float(value)
            except ValueError:
                return 0.0
        return 0.0
    def _parse_performance(self, text: str) -> float:
        """Parse performance value from text (handles E, P, T suffixes)"""
        text = text.strip().upper()
        multipliers = {
            "E": 1e18,
            "P": 1e15,
            "T": 1e12,
            "G": 1e9,
            "M": 1e6,
            "K": 1e3,
        }
        match = re.match(r"([\d.]+)\s*([EPTGMK])?F?LOP/?S?", text)
        if match:
            value = float(match.group(1))
            suffix = match.group(2)
            if suffix:
                value *= multipliers.get(suffix, 1)
            return value
        # Try simple float parsing
        try:
            return float(text.replace(",", ""))
        except ValueError:
            return 0.0
    def _get_sample_data(self) -> List[Dict[str, Any]]:
        """Return sample data for testing when scraping fails"""
        return [
            {
                "source_id": "top500_1",
                "name": "El Capitan - HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A",
                "country": "United States",
                "city": "Livermore, CA",
                "latitude": 37.6819,
                "longitude": -121.7681,
                "value": "1742.00",
                "unit": "PFlop/s",
                "metadata": {
                    "rank": 1,
                    "r_peak": 2746.38,
                    "power": 29581,
                    "cores": 11039616,
                    "manufacturer": "HPE",
                },
                "reference_date": "2025-11-01",
            },
            {
                "source_id": "top500_2",
                "name": "Frontier - HPE Cray EX235a, AMD Optimized 3rd Generation EPYC 64C 2GHz, AMD Instinct MI250X",
                "country": "United States",
                "city": "Oak Ridge, TN",
                "latitude": 36.0107,
                "longitude": -84.2663,
                "value": "1353.00",
                "unit": "PFlop/s",
                "metadata": {
                    "rank": 2,
                    "r_peak": 2055.72,
                    "power": 24607,
                    "cores": 9066176,
                    "manufacturer": "HPE",
                },
                "reference_date": "2025-11-01",
            },
            {
                "source_id": "top500_3",
                "name": "Aurora - HPE Cray EX - Intel Exascale Compute Blade, Xeon CPU Max 9470 52C 2.4GHz, Intel Data Center GPU Max",
                "country": "United States",
                "city": "Argonne, IL",
                "latitude": 41.3784,
                "longitude": -87.8600,
                "value": "1012.00",
                "unit": "PFlop/s",
                "metadata": {
                    "rank": 3,
                    "r_peak": 1980.01,
                    "power": 38698,
                    "cores": 9264128,
                    "manufacturer": "Intel",
                },
                "reference_date": "2025-11-01",
            },
        ]
--- a/backend/app/services/scheduler.py
+++ b/backend/app/services/scheduler.py
@@ -0,0 +1,146 @@
 """Task Scheduler for running collection jobs"""
 import asyncio
 import logging
 from datetime import datetime
 from typing import Dict, Any
 from apscheduler.schedulers.asyncio import AsyncIOScheduler
 from apscheduler.triggers.interval import IntervalTrigger
 from sqlalchemy.ext.asyncio import AsyncSession
 from app.db.session import async_session_factory
 from app.services.collectors.registry import collector_registry
 logger = logging.getLogger(__name__)
 scheduler = AsyncIOScheduler()
 COLLECTOR_TO_ID = {
    "top500": 1,
    "epoch_ai_gpu": 2,
    "huggingface_models": 3,
    "huggingface_datasets": 4,
    "huggingface_spaces": 5,
    "peeringdb_ixp": 6,
    "peeringdb_network": 7,
    "peeringdb_facility": 8,
    "telegeography_cables": 9,
    "telegeography_landing": 10,
    "telegeography_systems": 11,
 }
 async def run_collector_task(collector_name: str):
    """Run a single collector task"""
    collector = collector_registry.get(collector_name)
    if not collector:
        logger.error(f"Collector not found: {collector_name}")
        return
    # Get the correct datasource_id
    datasource_id = COLLECTOR_TO_ID.get(collector_name, 1)
    async with async_session_factory() as db:
        try:
            # Set the datasource_id on the collector instance
            collector._datasource_id = datasource_id
            logger.info(f"Running collector: {collector_name} (datasource_id={datasource_id})")
            result = await collector.run(db)
            logger.info(f"Collector {collector_name} completed: {result}")
        except Exception as e:
            logger.error(f"Collector {collector_name} failed: {e}")
 def start_scheduler():
    """Start the scheduler with all registered collectors"""
    collectors = collector_registry.all()
    for name, collector in collectors.items():
        if collector_registry.is_active(name):
            scheduler.add_job(
                run_collector_task,
                trigger=IntervalTrigger(hours=collector.frequency_hours),
                id=name,
                name=name,
                replace_existing=True,
                kwargs={"collector_name": name},
            )
            logger.info(f"Scheduled collector: {name} (every {collector.frequency_hours}h)")
    scheduler.start()
    logger.info("Scheduler started")
 def stop_scheduler():
    """Stop the scheduler"""
    scheduler.shutdown()
    logger.info("Scheduler stopped")
 def get_scheduler_jobs() -> list[Dict[str, Any]]:
    """Get all scheduled jobs"""
    jobs = []
    for job in scheduler.get_jobs():
        jobs.append(
            {
                "id": job.id,
                "name": job.name,
                "next_run_time": job.next_run_time.isoformat() if job.next_run_time else None,
                "trigger": str(job.trigger),
            }
        )
    return jobs
 def add_job(collector_name: str, hours: int = 4):
    """Add a new scheduled job"""
    collector = collector_registry.get(collector_name)
    if not collector:
        raise ValueError(f"Collector not found: {collector_name}")
    scheduler.add_job(
        run_collector_task,
        trigger=IntervalTrigger(hours=hours),
        id=collector_name,
        name=collector_name,
        replace_existing=True,
        kwargs={"collector_name": collector_name},
    )
    logger.info(f"Added scheduled job: {collector_name} (every {hours}h)")
 def remove_job(collector_name: str):
    """Remove a scheduled job"""
    scheduler.remove_job(collector_name)
    logger.info(f"Removed scheduled job: {collector_name}")
 def pause_job(collector_name: str):
    """Pause a scheduled job"""
    scheduler.pause_job(collector_name)
    logger.info(f"Paused job: {collector_name}")
 def resume_job(collector_name: str):
    """Resume a scheduled job"""
    scheduler.resume_job(collector_name)
    logger.info(f"Resumed job: {collector_name}")
 def run_collector_now(collector_name: str) -> bool:
    """Run a collector immediately (not scheduled)"""
    collector = collector_registry.get(collector_name)
    if not collector:
        logger.error(f"Collector not found: {collector_name}")
        return False
    try:
        asyncio.create_task(run_collector_task(collector_name))
        logger.info(f"Triggered collector: {collector_name}")
        return True
    except Exception as e:
        logger.error(f"Failed to trigger collector {collector_name}: {e}")
        return False
--- a/backend/app/tasks/init.py
+++ b/backend/app/tasks/init.py
@@ -0,0 +1,3 @@
 """Tasks package"""
 from app.tasks.scheduler import run_collector_task, run_collector_sync
--- a/backend/app/tasks/scheduler.py
+++ b/backend/app/tasks/scheduler.py
@@ -0,0 +1,52 @@
 """Celery tasks for data collection"""
 import asyncio
 from datetime import datetime
 from typing import Dict, Any
 from app.db.session import async_session_factory
 from app.services.collectors.registry import collector_registry
 async def run_collector_task(collector_name: str) -> Dict[str, Any]:
    """Run a single collector task"""
    collector = collector_registry.get(collector_name)
    if not collector:
        return {"status": "failed", "error": f"Collector {collector_name} not found"}
    if not collector_registry.is_active(collector_name):
        return {"status": "skipped", "reason": "Collector is disabled"}
    async with async_session_factory() as db:
        from app.models.task import CollectionTask
        from app.models.datasource import DataSource
        # Find datasource
        result = await db.execute(
            "SELECT id FROM data_sources WHERE collector_class = :class_name",
            {"class_name": f"{collector.__class__.__name__}"},
        )
        datasource = result.fetchone()
        task = CollectionTask(
            datasource_id=datasource[0] if datasource else 0,
            status="running",
            started_at=datetime.utcnow(),
        )
        db.add(task)
        await db.commit()
        result = await collector.run(db)
        task.status = result["status"]
        task.completed_at = datetime.utcnow()
        task.records_processed = result.get("records_processed", 0)
        task.error_message = result.get("error")
        await db.commit()
        return result
 def run_collector_sync(collector_name: str) -> Dict[str, Any]:
    """Synchronous wrapper for running collectors"""
    return asyncio.run(run_collector_task(collector_name))
--- a/backend/pytest.ini
+++ b/backend/pytest.ini
@@ -0,0 +1,10 @@
 [pytest]
 asyncio_mode = auto
 testpaths = tests
 python_files = test_*.py
 python_functions = test_*
 python_classes = Test*
 addopts = -v --tb=short
 filterwarnings =
    ignore::DeprecationWarning
    ignore::PendingDeprecationWarning
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -0,0 +1,18 @@
 fastapi>=0.109.0
 uvicorn[standard]>=0.27.0
 sqlalchemy[asyncio]>=2.0.25
 asyncpg>=0.29.0
 redis>=5.0.1
 pydantic>=2.5.0
 pydantic-settings>=2.1.0
 python-jose[cryptography]>=3.3.0
 passlib[bcrypt]>=1.7.4
 python-multipart>=0.0.6
 httpx>=0.26.0
 beautifulsoup4>=4.12.0
 aiofiles>=23.2.1
 python-dotenv>=1.0.0
 email-validator
 apscheduler>=3.10.4
 pytest>=7.4.0
 pytest-asyncio>=0.23.0
--- a/backend/scripts/init_admin.py
+++ b/backend/scripts/init_admin.py
@@ -0,0 +1,35 @@
 """Create default admin user"""
 import asyncio
 import sys
 sys.path.insert(0, ".")
 from app.core.security import get_password_hash
 from app.db.session import engine, async_session_factory
 from app.models.user import User
 async def create_admin():
    from sqlalchemy import text
    async with async_session_factory() as session:
        result = await session.execute(text("SELECT id FROM users WHERE username = 'admin'"))
        if result.fetchone():
            print("Admin user already exists")
            return
        admin = User(
            username="admin",
            email="admin@planet.local",
            password_hash=get_password_hash("admin123"),
            role="super_admin",
            is_active=True,
        )
        session.add(admin)
        await session.commit()
        print("Admin user created: admin / admin123")
 if __name__ == "__main__":
    asyncio.run(create_admin())
--- a/backend/scripts/init_db.py
+++ b/backend/scripts/init_db.py
@@ -0,0 +1,45 @@
 #!/usr/bin/env python3
 """Create initial admin user with pre-generated hash"""
 import asyncio
 import sys
 sys.path.insert(0, "/app")
 from sqlalchemy import text
 from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
 from sqlalchemy.orm import sessionmaker
 import bcrypt
 # Generate proper bcrypt hash
 ADMIN_PASSWORD_HASH = bcrypt.hashpw("admin123".encode(), bcrypt.gensalt()).decode()
 async def create_admin():
    DATABASE_URL = "postgresql+asyncpg://postgres:postgres@postgres:5432/planet_db"
    engine = create_async_engine(DATABASE_URL, echo=False)
    async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
    async with async_session() as session:
        result = await session.execute(
            text("SELECT id FROM users WHERE username = 'admin'")
        )
        if result.fetchone():
            print("Admin user already exists")
            return
        await session.execute(
            text("""
                INSERT INTO users (username, email, password_hash, role, is_active, created_at, updated_at)
                VALUES ('admin', 'admin@planet.local', :password, 'super_admin', true, NOW(), NOW())
            """),
            {"password": ADMIN_PASSWORD_HASH},
        )
        await session.commit()
        print(f"Admin user created: admin / admin123")
        print(f"Hash: {ADMIN_PASSWORD_HASH}")
 if __name__ == "__main__":
    asyncio.run(create_admin())
--- a/backend/tests/init.py
+++ b/backend/tests/init.py
@@ -0,0 +1 @@
 """Test configuration"""
--- a/backend/tests/pycache/init.cpython-311.pyc
+++ b/backend/tests/pycache/init.cpython-311.pyc
--- a/backend/tests/pycache/conftest.cpython-311-pytest-9.0.2.pyc
+++ b/backend/tests/pycache/conftest.cpython-311-pytest-9.0.2.pyc
--- a/backend/tests/pycache/test_api.cpython-311-pytest-9.0.2.pyc
+++ b/backend/tests/pycache/test_api.cpython-311-pytest-9.0.2.pyc
--- a/backend/tests/pycache/test_collectors.cpython-311-pytest-9.0.2.pyc
+++ b/backend/tests/pycache/test_collectors.cpython-311-pytest-9.0.2.pyc
--- a/backend/tests/pycache/test_models.cpython-311-pytest-9.0.2.pyc
+++ b/backend/tests/pycache/test_models.cpython-311-pytest-9.0.2.pyc
--- a/backend/tests/pycache/test_security.cpython-311-pytest-9.0.2.pyc
+++ b/backend/tests/pycache/test_security.cpython-311-pytest-9.0.2.pyc
--- a/backend/tests/conftest.py
+++ b/backend/tests/conftest.py
@@ -0,0 +1,103 @@
 """Pytest configuration and fixtures"""
 import pytest
 import asyncio
 from typing import AsyncGenerator
 from unittest.mock import AsyncMock, MagicMock, patch
 from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker
@pytest.fixture(scope="session")
 def event_loop():
    """Create event loop for async tests"""
    loop = asyncio.get_event_loop_policy().new_event_loop()
    yield loop
    loop.close()
@pytest.fixture
 def mock_db_session():
    """Mock database session"""
    session = AsyncMock(spec=AsyncSession)
    session.add = MagicMock()
    session.commit = AsyncMock()
    session.execute = AsyncMock()
    session.refresh = AsyncMock()
    session.close = AsyncMock()
    return session
@pytest.fixture
 def sample_top500_response():
    """Sample TOP500 API response"""
    return {
        "items": [
            {
                "rank": 1,
                "system_name": "Frontier",
                "country": "USA",
                "city": "Oak Ridge",
                "latitude": 35.9322,
                "longitude": -84.3108,
                "manufacturer": "HPE",
                "r_max": 1102000.0,
                "r_peak": 1685000.0,
                "power": 21510.0,
                "cores": 8730112,
                "interconnect": "Slingshot 11",
                "os": "CentOS",
            },
            {
                "rank": 2,
                "system_name": "Fugaku",
                "country": "Japan",
                "city": "Kobe",
                "latitude": 34.6913,
                "longitude": 135.1830,
                "manufacturer": "Fujitsu",
                "r_max": 442010.0,
                "r_peak": 537212.0,
                "power": 29899.0,
                "cores": 7630848,
                "interconnect": "Tofu interconnect D",
                "os": "RHEL",
            },
        ]
    }
@pytest.fixture
 def sample_huggingface_response():
    """Sample Hugging Face API response"""
    return {
        "models": [
            {
                "id": "bert-base-uncased",
                "author": "google",
                "description": "BERT base model",
                "likes": 25000,
                "downloads": 5000000,
                "language": "en",
                "tags": ["transformer", "bert"],
                "pipeline_tag": "feature-extraction",
                "library_name": "transformers",
                "createdAt": "2024-01-15T10:00:00Z",
            }
        ]
    }
@pytest.fixture
 def sample_alert_data():
    """Sample alert data"""
    return {
        "id": 1,
        "severity": "warning",
        "status": "active",
        "datasource_id": 2,
        "datasource_name": "Epoch AI",
        "message": "API response time > 30s",
        "created_at": "2024-01-20T09:30:00Z",
        "acknowledged_by": None,
    }
--- a/Show More
+++ b/Show More
		`@@ -0,0 +1,3 @@`
							`"""Tasks package"""`

							`from app.tasks.scheduler import run_collector_task, run_collector_sync`