# SA4CPS FTP Data Ingestion Service This service monitors the SA4CPS FTP server at `ftp.sa4cps.pt` and processes `.slg_v2` files for real-time energy monitoring data ingestion. ## Overview The Data Ingestion Service provides comprehensive FTP monitoring and data processing capabilities specifically designed for the SA4CPS project. It automatically detects, downloads, and processes `.slg_v2` files from the FTP server, converting them into standardized sensor readings for the energy monitoring dashboard. ## Architecture ``` ftp.sa4cps.pt (.slg_v2 files) ↓ FTP Monitor (polls every 5 minutes) ↓ Data Processor (supports multiple formats) ↓ Redis Publisher (3 topic channels) ↓ Real-time Dashboard & Analytics ``` ## Features ### FTP Monitoring - ✅ **Automatic Discovery**: Monitors `ftp.sa4cps.pt` for new `.slg_v2` files - ✅ **Duplicate Prevention**: Tracks processed files to avoid reprocessing - ✅ **Connection Management**: Maintains persistent FTP connections with automatic retry - ✅ **File Pattern Matching**: Supports `*.slg_v2` and custom file patterns - ✅ **Configurable Polling**: Default 5-minute intervals, fully configurable ### Data Processing - ✅ **Multi-Format Support**: CSV-style, space-delimited, tab-delimited `.slg_v2` files - ✅ **Smart Header Detection**: Automatically detects and parses header information - ✅ **Metadata Extraction**: Processes comment lines for file-level metadata - ✅ **Unit Inference**: Intelligent unit detection based on column names and value ranges - ✅ **Timestamp Handling**: Supports multiple timestamp formats with automatic parsing - ✅ **Multi-Value Support**: Handles files with multiple sensor readings per line ### Data Output - ✅ **Redis Publishing**: Real-time data streaming via Redis pub/sub - ✅ **Multiple Topics**: Publishes to 3 specialized channels: - `sa4cps_energy_data`: Energy consumption and power readings - `sa4cps_sensor_metrics`: Sensor telemetry and status data - `sa4cps_raw_data`: Raw unprocessed data for debugging - ✅ **Standardized Format**: Consistent sensor reading format across all outputs ## Quick Start ### 1. Deploy with Docker Compose ```bash cd microservices docker-compose up -d data-ingestion-service ``` ### 2. Auto-Configure SA4CPS Source ```bash # Run the automatic configuration script docker-compose exec data-ingestion-service python startup_sa4cps.py ``` ### 3. Verify Setup ```bash # Check service health curl http://localhost:8008/health # View configured data sources curl http://localhost:8008/sources # Monitor processing statistics curl http://localhost:8008/stats ``` ## Configuration ### Environment Variables Set these in the `docker-compose.yml`: ```yaml environment: - FTP_SA4CPS_HOST=ftp.sa4cps.pt # FTP server hostname - FTP_SA4CPS_PORT=21 # FTP port (default: 21) - FTP_SA4CPS_USERNAME=anonymous # FTP username - FTP_SA4CPS_PASSWORD= # FTP password (empty for anonymous) - FTP_SA4CPS_REMOTE_PATH=/ # Remote directory path ``` ### Manual Configuration You can also configure the SA4CPS data source programmatically: ```python from sa4cps_config import SA4CPSConfigurator configurator = SA4CPSConfigurator() # Create data source result = await configurator.create_sa4cps_data_source( username="your_username", password="your_password", remote_path="/data/energy" ) # Test connection test_result = await configurator.test_sa4cps_connection() # Check status status = await configurator.get_sa4cps_status() ``` ## API Endpoints ### Health & Status - `GET /health` - Service health check - `GET /stats` - Processing statistics - `GET /sources` - List all data sources ### Data Source Management - `POST /sources` - Create new data source - `PUT /sources/{id}` - Update data source - `DELETE /sources/{id}` - Delete data source - `POST /sources/{id}/test` - Test FTP connection - `POST /sources/{id}/trigger` - Manual processing trigger ### Monitoring - `GET /processing/status` - Current processing status - `GET /data-quality` - Data quality metrics - `GET /redis/topics` - Active Redis topics ## .slg_v2 File Format Support The service supports various `.slg_v2` file formats: ### CSV-Style Format ``` # SA4CPS Energy Data # Location: Building A timestamp,sensor_id,energy_kwh,power_w,voltage_v 2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1 2024-01-15T10:01:00Z,SENSOR_001,1235.1,865.3,229.8 ``` ### Space-Delimited Format ``` # Energy consumption data # System: Smart Grid Monitor 2024-01-15T10:00:00 LAB_A_001 1500.23 750.5 2024-01-15T10:01:00 LAB_A_001 1501.85 780.2 ``` ### Tab-Delimited Format ``` # Multi-sensor readings timestamp sensor_id energy power temp 2024-01-15T10:00:00Z BLDG_A_01 1234.5 850.2 22.5 ``` ## Data Output Format All processed data is converted to a standardized sensor reading format: ```json { "sensor_id": "SENSOR_001", "timestamp": 1705312800, "datetime": "2024-01-15T10:00:00", "value": 1234.5, "unit": "kWh", "value_type": "energy_kwh", "additional_values": { "power_w": {"value": 850.2, "unit": "W"}, "voltage_v": {"value": 230.1, "unit": "V"} }, "metadata": { "Location": "Building A", "line_number": 2, "raw_line": "2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1" }, "processed_at": "2024-01-15T10:01:23.456789", "data_source": "slg_v2", "file_format": "SA4CPS_SLG_V2" } ``` ## Redis Topics ### sa4cps_energy_data Primary energy consumption and power readings: - Energy consumption (kWh, MWh) - Power readings (W, kW, MW) - Efficiency metrics ### sa4cps_sensor_metrics Sensor telemetry and environmental data: - Voltage/Current readings - Temperature measurements - Sensor status/diagnostics - System health metrics ### sa4cps_raw_data Raw unprocessed data for debugging: - Original file content - Processing metadata - Error information - Quality metrics ## Monitoring & Troubleshooting ### Check Processing Status ```bash # View recent processing activity curl http://localhost:8008/processing/status | jq # Check data quality metrics curl http://localhost:8008/data-quality | jq # Monitor Redis topic activity curl http://localhost:8008/redis/topics | jq ``` ### View Logs ```bash # Service logs docker-compose logs -f data-ingestion-service # Follow specific log patterns docker-compose logs data-ingestion-service | grep "SA4CPS\|SLG_V2" ``` ### Common Issues 1. **FTP Connection Failed** - Verify `FTP_SA4CPS_HOST` is accessible - Check firewall/network settings - Validate username/password if not using anonymous 2. **No Files Found** - Confirm `.slg_v2` files exist in the remote path - Check `FTP_SA4CPS_REMOTE_PATH` configuration - Verify file permissions 3. **Processing Errors** - Check data format matches expected `.slg_v2` structure - Verify timestamp formats are supported - Review file content for parsing issues ## Development ### Testing ```bash # Run .slg_v2 format tests cd data-ingestion-service python test_slg_v2.py # Test SA4CPS configuration python sa4cps_config.py ``` ### Extending File Support To add support for new file formats: 1. Add format to `DataFormat` enum in `models.py` 2. Implement `_process_your_format_data()` in `data_processor.py` 3. Add format handling to `process_time_series_data()` method 4. Update `supported_formats` list ### Custom Processing Logic Override processing methods in `DataProcessor`: ```python class CustomSA4CPSProcessor(DataProcessor): async def _process_slg_v2_line(self, line, header, metadata, line_idx): # Custom line processing logic processed = await super()._process_slg_v2_line(line, header, metadata, line_idx) # Add custom fields processed['custom_field'] = 'custom_value' return processed ``` ## Support For issues or questions: 1. Check service logs: `docker-compose logs data-ingestion-service` 2. Verify configuration: `curl http://localhost:8008/sources` 3. Test FTP connection: `curl -X POST http://localhost:8008/sources/{id}/test` 4. Review processing status: `curl http://localhost:8008/processing/status` ## License This implementation is part of the SA4CPS project energy monitoring dashboard.