- Implement FTP monitoring and ingestion for SA4CPS .slg_v2 files - Add robust data processor with multi-format and unit inference support - Publish parsed data to Redis topics for real-time dashboard simulation - Include validation, monitoring, and auto-configuration scripts - Provide documentation and test scripts for SA4CPS integration
298 lines
8.1 KiB
Markdown
298 lines
8.1 KiB
Markdown
# SA4CPS FTP Data Ingestion Service
|
|
|
|
This service monitors the SA4CPS FTP server at `ftp.sa4cps.pt` and processes `.slg_v2` files for real-time energy monitoring data ingestion.
|
|
|
|
## Overview
|
|
|
|
The Data Ingestion Service provides comprehensive FTP monitoring and data processing capabilities specifically designed for the SA4CPS project. It automatically detects, downloads, and processes `.slg_v2` files from the FTP server, converting them into standardized sensor readings for the energy monitoring dashboard.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
ftp.sa4cps.pt (.slg_v2 files)
|
|
↓
|
|
FTP Monitor (polls every 5 minutes)
|
|
↓
|
|
Data Processor (supports multiple formats)
|
|
↓
|
|
Redis Publisher (3 topic channels)
|
|
↓
|
|
Real-time Dashboard & Analytics
|
|
```
|
|
|
|
## Features
|
|
|
|
### FTP Monitoring
|
|
- ✅ **Automatic Discovery**: Monitors `ftp.sa4cps.pt` for new `.slg_v2` files
|
|
- ✅ **Duplicate Prevention**: Tracks processed files to avoid reprocessing
|
|
- ✅ **Connection Management**: Maintains persistent FTP connections with automatic retry
|
|
- ✅ **File Pattern Matching**: Supports `*.slg_v2` and custom file patterns
|
|
- ✅ **Configurable Polling**: Default 5-minute intervals, fully configurable
|
|
|
|
### Data Processing
|
|
- ✅ **Multi-Format Support**: CSV-style, space-delimited, tab-delimited `.slg_v2` files
|
|
- ✅ **Smart Header Detection**: Automatically detects and parses header information
|
|
- ✅ **Metadata Extraction**: Processes comment lines for file-level metadata
|
|
- ✅ **Unit Inference**: Intelligent unit detection based on column names and value ranges
|
|
- ✅ **Timestamp Handling**: Supports multiple timestamp formats with automatic parsing
|
|
- ✅ **Multi-Value Support**: Handles files with multiple sensor readings per line
|
|
|
|
### Data Output
|
|
- ✅ **Redis Publishing**: Real-time data streaming via Redis pub/sub
|
|
- ✅ **Multiple Topics**: Publishes to 3 specialized channels:
|
|
- `sa4cps_energy_data`: Energy consumption and power readings
|
|
- `sa4cps_sensor_metrics`: Sensor telemetry and status data
|
|
- `sa4cps_raw_data`: Raw unprocessed data for debugging
|
|
- ✅ **Standardized Format**: Consistent sensor reading format across all outputs
|
|
|
|
## Quick Start
|
|
|
|
### 1. Deploy with Docker Compose
|
|
|
|
```bash
|
|
cd microservices
|
|
docker-compose up -d data-ingestion-service
|
|
```
|
|
|
|
### 2. Auto-Configure SA4CPS Source
|
|
|
|
```bash
|
|
# Run the automatic configuration script
|
|
docker-compose exec data-ingestion-service python startup_sa4cps.py
|
|
```
|
|
|
|
### 3. Verify Setup
|
|
|
|
```bash
|
|
# Check service health
|
|
curl http://localhost:8008/health
|
|
|
|
# View configured data sources
|
|
curl http://localhost:8008/sources
|
|
|
|
# Monitor processing statistics
|
|
curl http://localhost:8008/stats
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Set these in the `docker-compose.yml`:
|
|
|
|
```yaml
|
|
environment:
|
|
- FTP_SA4CPS_HOST=ftp.sa4cps.pt # FTP server hostname
|
|
- FTP_SA4CPS_PORT=21 # FTP port (default: 21)
|
|
- FTP_SA4CPS_USERNAME=anonymous # FTP username
|
|
- FTP_SA4CPS_PASSWORD= # FTP password (empty for anonymous)
|
|
- FTP_SA4CPS_REMOTE_PATH=/ # Remote directory path
|
|
```
|
|
|
|
### Manual Configuration
|
|
|
|
You can also configure the SA4CPS data source programmatically:
|
|
|
|
```python
|
|
from sa4cps_config import SA4CPSConfigurator
|
|
|
|
configurator = SA4CPSConfigurator()
|
|
|
|
# Create data source
|
|
result = await configurator.create_sa4cps_data_source(
|
|
username="your_username",
|
|
password="your_password",
|
|
remote_path="/data/energy"
|
|
)
|
|
|
|
# Test connection
|
|
test_result = await configurator.test_sa4cps_connection()
|
|
|
|
# Check status
|
|
status = await configurator.get_sa4cps_status()
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### Health & Status
|
|
- `GET /health` - Service health check
|
|
- `GET /stats` - Processing statistics
|
|
- `GET /sources` - List all data sources
|
|
|
|
### Data Source Management
|
|
- `POST /sources` - Create new data source
|
|
- `PUT /sources/{id}` - Update data source
|
|
- `DELETE /sources/{id}` - Delete data source
|
|
- `POST /sources/{id}/test` - Test FTP connection
|
|
- `POST /sources/{id}/trigger` - Manual processing trigger
|
|
|
|
### Monitoring
|
|
- `GET /processing/status` - Current processing status
|
|
- `GET /data-quality` - Data quality metrics
|
|
- `GET /redis/topics` - Active Redis topics
|
|
|
|
## .slg_v2 File Format Support
|
|
|
|
The service supports various `.slg_v2` file formats:
|
|
|
|
### CSV-Style Format
|
|
```
|
|
# SA4CPS Energy Data
|
|
# Location: Building A
|
|
timestamp,sensor_id,energy_kwh,power_w,voltage_v
|
|
2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1
|
|
2024-01-15T10:01:00Z,SENSOR_001,1235.1,865.3,229.8
|
|
```
|
|
|
|
### Space-Delimited Format
|
|
```
|
|
# Energy consumption data
|
|
# System: Smart Grid Monitor
|
|
2024-01-15T10:00:00 LAB_A_001 1500.23 750.5
|
|
2024-01-15T10:01:00 LAB_A_001 1501.85 780.2
|
|
```
|
|
|
|
### Tab-Delimited Format
|
|
```
|
|
# Multi-sensor readings
|
|
timestamp sensor_id energy power temp
|
|
2024-01-15T10:00:00Z BLDG_A_01 1234.5 850.2 22.5
|
|
```
|
|
|
|
## Data Output Format
|
|
|
|
All processed data is converted to a standardized sensor reading format:
|
|
|
|
```json
|
|
{
|
|
"sensor_id": "SENSOR_001",
|
|
"timestamp": 1705312800,
|
|
"datetime": "2024-01-15T10:00:00",
|
|
"value": 1234.5,
|
|
"unit": "kWh",
|
|
"value_type": "energy_kwh",
|
|
"additional_values": {
|
|
"power_w": {"value": 850.2, "unit": "W"},
|
|
"voltage_v": {"value": 230.1, "unit": "V"}
|
|
},
|
|
"metadata": {
|
|
"Location": "Building A",
|
|
"line_number": 2,
|
|
"raw_line": "2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1"
|
|
},
|
|
"processed_at": "2024-01-15T10:01:23.456789",
|
|
"data_source": "slg_v2",
|
|
"file_format": "SA4CPS_SLG_V2"
|
|
}
|
|
```
|
|
|
|
## Redis Topics
|
|
|
|
### sa4cps_energy_data
|
|
Primary energy consumption and power readings:
|
|
- Energy consumption (kWh, MWh)
|
|
- Power readings (W, kW, MW)
|
|
- Efficiency metrics
|
|
|
|
### sa4cps_sensor_metrics
|
|
Sensor telemetry and environmental data:
|
|
- Voltage/Current readings
|
|
- Temperature measurements
|
|
- Sensor status/diagnostics
|
|
- System health metrics
|
|
|
|
### sa4cps_raw_data
|
|
Raw unprocessed data for debugging:
|
|
- Original file content
|
|
- Processing metadata
|
|
- Error information
|
|
- Quality metrics
|
|
|
|
## Monitoring & Troubleshooting
|
|
|
|
### Check Processing Status
|
|
```bash
|
|
# View recent processing activity
|
|
curl http://localhost:8008/processing/status | jq
|
|
|
|
# Check data quality metrics
|
|
curl http://localhost:8008/data-quality | jq
|
|
|
|
# Monitor Redis topic activity
|
|
curl http://localhost:8008/redis/topics | jq
|
|
```
|
|
|
|
### View Logs
|
|
```bash
|
|
# Service logs
|
|
docker-compose logs -f data-ingestion-service
|
|
|
|
# Follow specific log patterns
|
|
docker-compose logs data-ingestion-service | grep "SA4CPS\|SLG_V2"
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
1. **FTP Connection Failed**
|
|
- Verify `FTP_SA4CPS_HOST` is accessible
|
|
- Check firewall/network settings
|
|
- Validate username/password if not using anonymous
|
|
|
|
2. **No Files Found**
|
|
- Confirm `.slg_v2` files exist in the remote path
|
|
- Check `FTP_SA4CPS_REMOTE_PATH` configuration
|
|
- Verify file permissions
|
|
|
|
3. **Processing Errors**
|
|
- Check data format matches expected `.slg_v2` structure
|
|
- Verify timestamp formats are supported
|
|
- Review file content for parsing issues
|
|
|
|
## Development
|
|
|
|
### Testing
|
|
```bash
|
|
# Run .slg_v2 format tests
|
|
cd data-ingestion-service
|
|
python test_slg_v2.py
|
|
|
|
# Test SA4CPS configuration
|
|
python sa4cps_config.py
|
|
```
|
|
|
|
### Extending File Support
|
|
|
|
To add support for new file formats:
|
|
|
|
1. Add format to `DataFormat` enum in `models.py`
|
|
2. Implement `_process_your_format_data()` in `data_processor.py`
|
|
3. Add format handling to `process_time_series_data()` method
|
|
4. Update `supported_formats` list
|
|
|
|
### Custom Processing Logic
|
|
|
|
Override processing methods in `DataProcessor`:
|
|
|
|
```python
|
|
class CustomSA4CPSProcessor(DataProcessor):
|
|
async def _process_slg_v2_line(self, line, header, metadata, line_idx):
|
|
# Custom line processing logic
|
|
processed = await super()._process_slg_v2_line(line, header, metadata, line_idx)
|
|
|
|
# Add custom fields
|
|
processed['custom_field'] = 'custom_value'
|
|
|
|
return processed
|
|
```
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
1. Check service logs: `docker-compose logs data-ingestion-service`
|
|
2. Verify configuration: `curl http://localhost:8008/sources`
|
|
3. Test FTP connection: `curl -X POST http://localhost:8008/sources/{id}/test`
|
|
4. Review processing status: `curl http://localhost:8008/processing/status`
|
|
|
|
## License
|
|
|
|
This implementation is part of the SA4CPS project energy monitoring dashboard. |