Add data-ingestion-service for SA4CPS FTP integration
- Implement FTP monitoring and ingestion for SA4CPS .slg_v2 files - Add robust data processor with multi-format and unit inference support - Publish parsed data to Redis topics for real-time dashboard simulation - Include validation, monitoring, and auto-configuration scripts - Provide documentation and test scripts for SA4CPS integration
This commit is contained in:
298
microservices/data-ingestion-service/README_SA4CPS.md
Normal file
298
microservices/data-ingestion-service/README_SA4CPS.md
Normal file
@@ -0,0 +1,298 @@
|
||||
# SA4CPS FTP Data Ingestion Service
|
||||
|
||||
This service monitors the SA4CPS FTP server at `ftp.sa4cps.pt` and processes `.slg_v2` files for real-time energy monitoring data ingestion.
|
||||
|
||||
## Overview
|
||||
|
||||
The Data Ingestion Service provides comprehensive FTP monitoring and data processing capabilities specifically designed for the SA4CPS project. It automatically detects, downloads, and processes `.slg_v2` files from the FTP server, converting them into standardized sensor readings for the energy monitoring dashboard.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
ftp.sa4cps.pt (.slg_v2 files)
|
||||
↓
|
||||
FTP Monitor (polls every 5 minutes)
|
||||
↓
|
||||
Data Processor (supports multiple formats)
|
||||
↓
|
||||
Redis Publisher (3 topic channels)
|
||||
↓
|
||||
Real-time Dashboard & Analytics
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### FTP Monitoring
|
||||
- ✅ **Automatic Discovery**: Monitors `ftp.sa4cps.pt` for new `.slg_v2` files
|
||||
- ✅ **Duplicate Prevention**: Tracks processed files to avoid reprocessing
|
||||
- ✅ **Connection Management**: Maintains persistent FTP connections with automatic retry
|
||||
- ✅ **File Pattern Matching**: Supports `*.slg_v2` and custom file patterns
|
||||
- ✅ **Configurable Polling**: Default 5-minute intervals, fully configurable
|
||||
|
||||
### Data Processing
|
||||
- ✅ **Multi-Format Support**: CSV-style, space-delimited, tab-delimited `.slg_v2` files
|
||||
- ✅ **Smart Header Detection**: Automatically detects and parses header information
|
||||
- ✅ **Metadata Extraction**: Processes comment lines for file-level metadata
|
||||
- ✅ **Unit Inference**: Intelligent unit detection based on column names and value ranges
|
||||
- ✅ **Timestamp Handling**: Supports multiple timestamp formats with automatic parsing
|
||||
- ✅ **Multi-Value Support**: Handles files with multiple sensor readings per line
|
||||
|
||||
### Data Output
|
||||
- ✅ **Redis Publishing**: Real-time data streaming via Redis pub/sub
|
||||
- ✅ **Multiple Topics**: Publishes to 3 specialized channels:
|
||||
- `sa4cps_energy_data`: Energy consumption and power readings
|
||||
- `sa4cps_sensor_metrics`: Sensor telemetry and status data
|
||||
- `sa4cps_raw_data`: Raw unprocessed data for debugging
|
||||
- ✅ **Standardized Format**: Consistent sensor reading format across all outputs
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Deploy with Docker Compose
|
||||
|
||||
```bash
|
||||
cd microservices
|
||||
docker-compose up -d data-ingestion-service
|
||||
```
|
||||
|
||||
### 2. Auto-Configure SA4CPS Source
|
||||
|
||||
```bash
|
||||
# Run the automatic configuration script
|
||||
docker-compose exec data-ingestion-service python startup_sa4cps.py
|
||||
```
|
||||
|
||||
### 3. Verify Setup
|
||||
|
||||
```bash
|
||||
# Check service health
|
||||
curl http://localhost:8008/health
|
||||
|
||||
# View configured data sources
|
||||
curl http://localhost:8008/sources
|
||||
|
||||
# Monitor processing statistics
|
||||
curl http://localhost:8008/stats
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Set these in the `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- FTP_SA4CPS_HOST=ftp.sa4cps.pt # FTP server hostname
|
||||
- FTP_SA4CPS_PORT=21 # FTP port (default: 21)
|
||||
- FTP_SA4CPS_USERNAME=anonymous # FTP username
|
||||
- FTP_SA4CPS_PASSWORD= # FTP password (empty for anonymous)
|
||||
- FTP_SA4CPS_REMOTE_PATH=/ # Remote directory path
|
||||
```
|
||||
|
||||
### Manual Configuration
|
||||
|
||||
You can also configure the SA4CPS data source programmatically:
|
||||
|
||||
```python
|
||||
from sa4cps_config import SA4CPSConfigurator
|
||||
|
||||
configurator = SA4CPSConfigurator()
|
||||
|
||||
# Create data source
|
||||
result = await configurator.create_sa4cps_data_source(
|
||||
username="your_username",
|
||||
password="your_password",
|
||||
remote_path="/data/energy"
|
||||
)
|
||||
|
||||
# Test connection
|
||||
test_result = await configurator.test_sa4cps_connection()
|
||||
|
||||
# Check status
|
||||
status = await configurator.get_sa4cps_status()
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Health & Status
|
||||
- `GET /health` - Service health check
|
||||
- `GET /stats` - Processing statistics
|
||||
- `GET /sources` - List all data sources
|
||||
|
||||
### Data Source Management
|
||||
- `POST /sources` - Create new data source
|
||||
- `PUT /sources/{id}` - Update data source
|
||||
- `DELETE /sources/{id}` - Delete data source
|
||||
- `POST /sources/{id}/test` - Test FTP connection
|
||||
- `POST /sources/{id}/trigger` - Manual processing trigger
|
||||
|
||||
### Monitoring
|
||||
- `GET /processing/status` - Current processing status
|
||||
- `GET /data-quality` - Data quality metrics
|
||||
- `GET /redis/topics` - Active Redis topics
|
||||
|
||||
## .slg_v2 File Format Support
|
||||
|
||||
The service supports various `.slg_v2` file formats:
|
||||
|
||||
### CSV-Style Format
|
||||
```
|
||||
# SA4CPS Energy Data
|
||||
# Location: Building A
|
||||
timestamp,sensor_id,energy_kwh,power_w,voltage_v
|
||||
2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1
|
||||
2024-01-15T10:01:00Z,SENSOR_001,1235.1,865.3,229.8
|
||||
```
|
||||
|
||||
### Space-Delimited Format
|
||||
```
|
||||
# Energy consumption data
|
||||
# System: Smart Grid Monitor
|
||||
2024-01-15T10:00:00 LAB_A_001 1500.23 750.5
|
||||
2024-01-15T10:01:00 LAB_A_001 1501.85 780.2
|
||||
```
|
||||
|
||||
### Tab-Delimited Format
|
||||
```
|
||||
# Multi-sensor readings
|
||||
timestamp sensor_id energy power temp
|
||||
2024-01-15T10:00:00Z BLDG_A_01 1234.5 850.2 22.5
|
||||
```
|
||||
|
||||
## Data Output Format
|
||||
|
||||
All processed data is converted to a standardized sensor reading format:
|
||||
|
||||
```json
|
||||
{
|
||||
"sensor_id": "SENSOR_001",
|
||||
"timestamp": 1705312800,
|
||||
"datetime": "2024-01-15T10:00:00",
|
||||
"value": 1234.5,
|
||||
"unit": "kWh",
|
||||
"value_type": "energy_kwh",
|
||||
"additional_values": {
|
||||
"power_w": {"value": 850.2, "unit": "W"},
|
||||
"voltage_v": {"value": 230.1, "unit": "V"}
|
||||
},
|
||||
"metadata": {
|
||||
"Location": "Building A",
|
||||
"line_number": 2,
|
||||
"raw_line": "2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1"
|
||||
},
|
||||
"processed_at": "2024-01-15T10:01:23.456789",
|
||||
"data_source": "slg_v2",
|
||||
"file_format": "SA4CPS_SLG_V2"
|
||||
}
|
||||
```
|
||||
|
||||
## Redis Topics
|
||||
|
||||
### sa4cps_energy_data
|
||||
Primary energy consumption and power readings:
|
||||
- Energy consumption (kWh, MWh)
|
||||
- Power readings (W, kW, MW)
|
||||
- Efficiency metrics
|
||||
|
||||
### sa4cps_sensor_metrics
|
||||
Sensor telemetry and environmental data:
|
||||
- Voltage/Current readings
|
||||
- Temperature measurements
|
||||
- Sensor status/diagnostics
|
||||
- System health metrics
|
||||
|
||||
### sa4cps_raw_data
|
||||
Raw unprocessed data for debugging:
|
||||
- Original file content
|
||||
- Processing metadata
|
||||
- Error information
|
||||
- Quality metrics
|
||||
|
||||
## Monitoring & Troubleshooting
|
||||
|
||||
### Check Processing Status
|
||||
```bash
|
||||
# View recent processing activity
|
||||
curl http://localhost:8008/processing/status | jq
|
||||
|
||||
# Check data quality metrics
|
||||
curl http://localhost:8008/data-quality | jq
|
||||
|
||||
# Monitor Redis topic activity
|
||||
curl http://localhost:8008/redis/topics | jq
|
||||
```
|
||||
|
||||
### View Logs
|
||||
```bash
|
||||
# Service logs
|
||||
docker-compose logs -f data-ingestion-service
|
||||
|
||||
# Follow specific log patterns
|
||||
docker-compose logs data-ingestion-service | grep "SA4CPS\|SLG_V2"
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **FTP Connection Failed**
|
||||
- Verify `FTP_SA4CPS_HOST` is accessible
|
||||
- Check firewall/network settings
|
||||
- Validate username/password if not using anonymous
|
||||
|
||||
2. **No Files Found**
|
||||
- Confirm `.slg_v2` files exist in the remote path
|
||||
- Check `FTP_SA4CPS_REMOTE_PATH` configuration
|
||||
- Verify file permissions
|
||||
|
||||
3. **Processing Errors**
|
||||
- Check data format matches expected `.slg_v2` structure
|
||||
- Verify timestamp formats are supported
|
||||
- Review file content for parsing issues
|
||||
|
||||
## Development
|
||||
|
||||
### Testing
|
||||
```bash
|
||||
# Run .slg_v2 format tests
|
||||
cd data-ingestion-service
|
||||
python test_slg_v2.py
|
||||
|
||||
# Test SA4CPS configuration
|
||||
python sa4cps_config.py
|
||||
```
|
||||
|
||||
### Extending File Support
|
||||
|
||||
To add support for new file formats:
|
||||
|
||||
1. Add format to `DataFormat` enum in `models.py`
|
||||
2. Implement `_process_your_format_data()` in `data_processor.py`
|
||||
3. Add format handling to `process_time_series_data()` method
|
||||
4. Update `supported_formats` list
|
||||
|
||||
### Custom Processing Logic
|
||||
|
||||
Override processing methods in `DataProcessor`:
|
||||
|
||||
```python
|
||||
class CustomSA4CPSProcessor(DataProcessor):
|
||||
async def _process_slg_v2_line(self, line, header, metadata, line_idx):
|
||||
# Custom line processing logic
|
||||
processed = await super()._process_slg_v2_line(line, header, metadata, line_idx)
|
||||
|
||||
# Add custom fields
|
||||
processed['custom_field'] = 'custom_value'
|
||||
|
||||
return processed
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
1. Check service logs: `docker-compose logs data-ingestion-service`
|
||||
2. Verify configuration: `curl http://localhost:8008/sources`
|
||||
3. Test FTP connection: `curl -X POST http://localhost:8008/sources/{id}/test`
|
||||
4. Review processing status: `curl http://localhost:8008/processing/status`
|
||||
|
||||
## License
|
||||
|
||||
This implementation is part of the SA4CPS project energy monitoring dashboard.
|
||||
Reference in New Issue
Block a user