Add data-ingestion-service for SA4CPS FTP integration

- Implement FTP monitoring and ingestion for SA4CPS .slg_v2 files - Add
robust data processor with multi-format and unit inference support -
Publish parsed data to Redis topics for real-time dashboard simulation -
Include validation, monitoring, and auto-configuration scripts - Provide
documentation and test scripts for SA4CPS integration
This commit is contained in:
rafaeldpsilva
2025-09-10 14:43:30 +01:00
parent d4f280de93
commit 5fdce00e5d
16 changed files with 6353 additions and 0 deletions

View File

@@ -0,0 +1,298 @@
# SA4CPS FTP Data Ingestion Service
This service monitors the SA4CPS FTP server at `ftp.sa4cps.pt` and processes `.slg_v2` files for real-time energy monitoring data ingestion.
## Overview
The Data Ingestion Service provides comprehensive FTP monitoring and data processing capabilities specifically designed for the SA4CPS project. It automatically detects, downloads, and processes `.slg_v2` files from the FTP server, converting them into standardized sensor readings for the energy monitoring dashboard.
## Architecture
```
ftp.sa4cps.pt (.slg_v2 files)
FTP Monitor (polls every 5 minutes)
Data Processor (supports multiple formats)
Redis Publisher (3 topic channels)
Real-time Dashboard & Analytics
```
## Features
### FTP Monitoring
-**Automatic Discovery**: Monitors `ftp.sa4cps.pt` for new `.slg_v2` files
-**Duplicate Prevention**: Tracks processed files to avoid reprocessing
-**Connection Management**: Maintains persistent FTP connections with automatic retry
-**File Pattern Matching**: Supports `*.slg_v2` and custom file patterns
-**Configurable Polling**: Default 5-minute intervals, fully configurable
### Data Processing
-**Multi-Format Support**: CSV-style, space-delimited, tab-delimited `.slg_v2` files
-**Smart Header Detection**: Automatically detects and parses header information
-**Metadata Extraction**: Processes comment lines for file-level metadata
-**Unit Inference**: Intelligent unit detection based on column names and value ranges
-**Timestamp Handling**: Supports multiple timestamp formats with automatic parsing
-**Multi-Value Support**: Handles files with multiple sensor readings per line
### Data Output
-**Redis Publishing**: Real-time data streaming via Redis pub/sub
-**Multiple Topics**: Publishes to 3 specialized channels:
- `sa4cps_energy_data`: Energy consumption and power readings
- `sa4cps_sensor_metrics`: Sensor telemetry and status data
- `sa4cps_raw_data`: Raw unprocessed data for debugging
-**Standardized Format**: Consistent sensor reading format across all outputs
## Quick Start
### 1. Deploy with Docker Compose
```bash
cd microservices
docker-compose up -d data-ingestion-service
```
### 2. Auto-Configure SA4CPS Source
```bash
# Run the automatic configuration script
docker-compose exec data-ingestion-service python startup_sa4cps.py
```
### 3. Verify Setup
```bash
# Check service health
curl http://localhost:8008/health
# View configured data sources
curl http://localhost:8008/sources
# Monitor processing statistics
curl http://localhost:8008/stats
```
## Configuration
### Environment Variables
Set these in the `docker-compose.yml`:
```yaml
environment:
- FTP_SA4CPS_HOST=ftp.sa4cps.pt # FTP server hostname
- FTP_SA4CPS_PORT=21 # FTP port (default: 21)
- FTP_SA4CPS_USERNAME=anonymous # FTP username
- FTP_SA4CPS_PASSWORD= # FTP password (empty for anonymous)
- FTP_SA4CPS_REMOTE_PATH=/ # Remote directory path
```
### Manual Configuration
You can also configure the SA4CPS data source programmatically:
```python
from sa4cps_config import SA4CPSConfigurator
configurator = SA4CPSConfigurator()
# Create data source
result = await configurator.create_sa4cps_data_source(
username="your_username",
password="your_password",
remote_path="/data/energy"
)
# Test connection
test_result = await configurator.test_sa4cps_connection()
# Check status
status = await configurator.get_sa4cps_status()
```
## API Endpoints
### Health & Status
- `GET /health` - Service health check
- `GET /stats` - Processing statistics
- `GET /sources` - List all data sources
### Data Source Management
- `POST /sources` - Create new data source
- `PUT /sources/{id}` - Update data source
- `DELETE /sources/{id}` - Delete data source
- `POST /sources/{id}/test` - Test FTP connection
- `POST /sources/{id}/trigger` - Manual processing trigger
### Monitoring
- `GET /processing/status` - Current processing status
- `GET /data-quality` - Data quality metrics
- `GET /redis/topics` - Active Redis topics
## .slg_v2 File Format Support
The service supports various `.slg_v2` file formats:
### CSV-Style Format
```
# SA4CPS Energy Data
# Location: Building A
timestamp,sensor_id,energy_kwh,power_w,voltage_v
2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1
2024-01-15T10:01:00Z,SENSOR_001,1235.1,865.3,229.8
```
### Space-Delimited Format
```
# Energy consumption data
# System: Smart Grid Monitor
2024-01-15T10:00:00 LAB_A_001 1500.23 750.5
2024-01-15T10:01:00 LAB_A_001 1501.85 780.2
```
### Tab-Delimited Format
```
# Multi-sensor readings
timestamp sensor_id energy power temp
2024-01-15T10:00:00Z BLDG_A_01 1234.5 850.2 22.5
```
## Data Output Format
All processed data is converted to a standardized sensor reading format:
```json
{
"sensor_id": "SENSOR_001",
"timestamp": 1705312800,
"datetime": "2024-01-15T10:00:00",
"value": 1234.5,
"unit": "kWh",
"value_type": "energy_kwh",
"additional_values": {
"power_w": {"value": 850.2, "unit": "W"},
"voltage_v": {"value": 230.1, "unit": "V"}
},
"metadata": {
"Location": "Building A",
"line_number": 2,
"raw_line": "2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1"
},
"processed_at": "2024-01-15T10:01:23.456789",
"data_source": "slg_v2",
"file_format": "SA4CPS_SLG_V2"
}
```
## Redis Topics
### sa4cps_energy_data
Primary energy consumption and power readings:
- Energy consumption (kWh, MWh)
- Power readings (W, kW, MW)
- Efficiency metrics
### sa4cps_sensor_metrics
Sensor telemetry and environmental data:
- Voltage/Current readings
- Temperature measurements
- Sensor status/diagnostics
- System health metrics
### sa4cps_raw_data
Raw unprocessed data for debugging:
- Original file content
- Processing metadata
- Error information
- Quality metrics
## Monitoring & Troubleshooting
### Check Processing Status
```bash
# View recent processing activity
curl http://localhost:8008/processing/status | jq
# Check data quality metrics
curl http://localhost:8008/data-quality | jq
# Monitor Redis topic activity
curl http://localhost:8008/redis/topics | jq
```
### View Logs
```bash
# Service logs
docker-compose logs -f data-ingestion-service
# Follow specific log patterns
docker-compose logs data-ingestion-service | grep "SA4CPS\|SLG_V2"
```
### Common Issues
1. **FTP Connection Failed**
- Verify `FTP_SA4CPS_HOST` is accessible
- Check firewall/network settings
- Validate username/password if not using anonymous
2. **No Files Found**
- Confirm `.slg_v2` files exist in the remote path
- Check `FTP_SA4CPS_REMOTE_PATH` configuration
- Verify file permissions
3. **Processing Errors**
- Check data format matches expected `.slg_v2` structure
- Verify timestamp formats are supported
- Review file content for parsing issues
## Development
### Testing
```bash
# Run .slg_v2 format tests
cd data-ingestion-service
python test_slg_v2.py
# Test SA4CPS configuration
python sa4cps_config.py
```
### Extending File Support
To add support for new file formats:
1. Add format to `DataFormat` enum in `models.py`
2. Implement `_process_your_format_data()` in `data_processor.py`
3. Add format handling to `process_time_series_data()` method
4. Update `supported_formats` list
### Custom Processing Logic
Override processing methods in `DataProcessor`:
```python
class CustomSA4CPSProcessor(DataProcessor):
async def _process_slg_v2_line(self, line, header, metadata, line_idx):
# Custom line processing logic
processed = await super()._process_slg_v2_line(line, header, metadata, line_idx)
# Add custom fields
processed['custom_field'] = 'custom_value'
return processed
```
## Support
For issues or questions:
1. Check service logs: `docker-compose logs data-ingestion-service`
2. Verify configuration: `curl http://localhost:8008/sources`
3. Test FTP connection: `curl -X POST http://localhost:8008/sources/{id}/test`
4. Review processing status: `curl http://localhost:8008/processing/status`
## License
This implementation is part of the SA4CPS project energy monitoring dashboard.