typo - Replace Motor (async) with PyMongo (sync) in database manager - Update environment variable names for FTP and MongoDB config - Remove unused dependencies from requirements.txt - Fix file extension typo: .slg_v2 → .sgl_v2 throughout code and docs - Add debug prints for MongoDB env vars in config - Update FTP monitor to use correct file extension and PyMongo - Adjust FastAPI descriptions for new extension
8.1 KiB
SA4CPS FTP Data Ingestion Service
This service monitors the SA4CPS FTP server at ftp.sa4cps.pt and processes .slg_v2 files for real-time energy monitoring data ingestion.
Overview
The Data Ingestion Service provides comprehensive FTP monitoring and data processing capabilities specifically designed for the SA4CPS project. It automatically detects, downloads, and processes .slg_v2 files from the FTP server, converting them into standardized sensor readings for the energy monitoring dashboard.
Architecture
ftp.sa4cps.pt (.slg_v2 files)
↓
FTP Monitor (polls every 5 minutes)
↓
Data Processor (supports multiple formats)
↓
Redis Publisher (3 topic channels)
↓
Real-time Dashboard & Analytics
Features
FTP Monitoring
- ✅ Automatic Discovery: Monitors
ftp.sa4cps.ptfor new.slg_v2files - ✅ Duplicate Prevention: Tracks processed files to avoid reprocessing
- ✅ Connection Management: Maintains persistent FTP connections with automatic retry
- ✅ File Pattern Matching: Supports
*.slg_v2and custom file patterns - ✅ Configurable Polling: Default 5-minute intervals, fully configurable
Data Processing
- ✅ Multi-Format Support: CSV-style, space-delimited, tab-delimited
.slg_v2files - ✅ Smart Header Detection: Automatically detects and parses header information
- ✅ Metadata Extraction: Processes comment lines for file-level metadata
- ✅ Unit Inference: Intelligent unit detection based on column names and value ranges
- ✅ Timestamp Handling: Supports multiple timestamp formats with automatic parsing
- ✅ Multi-Value Support: Handles files with multiple sensor readings per line
Data Output
- ✅ Redis Publishing: Real-time data streaming via Redis pub/sub
- ✅ Multiple Topics: Publishes to 3 specialized channels:
sa4cps_energy_data: Energy consumption and power readingssa4cps_sensor_metrics: Sensor telemetry and status datasa4cps_raw_data: Raw unprocessed data for debugging
- ✅ Standardized Format: Consistent sensor reading format across all outputs
Quick Start
1. Deploy with Docker Compose
cd microservices
docker-compose up -d data-ingestion-service
2. Auto-Configure SA4CPS Source
# Run the automatic configuration script
docker-compose exec data-ingestion-service python startup_sa4cps.py
3. Verify Setup
# Check service health
curl http://localhost:8008/health
# View configured data sources
curl http://localhost:8008/sources
# Monitor processing statistics
curl http://localhost:8008/stats
Configuration
Environment Variables
Set these in the docker-compose.yml:
environment:
- FTP_SA4CPS_HOST=ftp.sa4cps.pt # FTP server hostname
- FTP_SA4CPS_PORT=21 # FTP port (default: 21)
- FTP_SA4CPS_USERNAME= # FTP username
- FTP_SA4CPS_PASSWORD= # FTP password (empty for anonymous)
- FTP_SA4CPS_REMOTE_PATH=/ # Remote directory path
Manual Configuration
You can also configure the SA4CPS data source programmatically:
from sa4cps_config import SA4CPSConfigurator
configurator = SA4CPSConfigurator()
# Create data source
result = await configurator.create_sa4cps_data_source(
username="your_username",
password="your_password",
remote_path="/data/energy"
)
# Test connection
test_result = await configurator.test_sa4cps_connection()
# Check status
status = await configurator.get_sa4cps_status()
API Endpoints
Health & Status
GET /health- Service health checkGET /stats- Processing statisticsGET /sources- List all data sources
Data Source Management
POST /sources- Create new data sourcePUT /sources/{id}- Update data sourceDELETE /sources/{id}- Delete data sourcePOST /sources/{id}/test- Test FTP connectionPOST /sources/{id}/trigger- Manual processing trigger
Monitoring
GET /processing/status- Current processing statusGET /data-quality- Data quality metricsGET /redis/topics- Active Redis topics
.slg_v2 File Format Support
The service supports various .slg_v2 file formats:
CSV-Style Format
# SA4CPS Energy Data
# Location: Building A
timestamp,sensor_id,energy_kwh,power_w,voltage_v
2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1
2024-01-15T10:01:00Z,SENSOR_001,1235.1,865.3,229.8
Space-Delimited Format
# Energy consumption data
# System: Smart Grid Monitor
2024-01-15T10:00:00 LAB_A_001 1500.23 750.5
2024-01-15T10:01:00 LAB_A_001 1501.85 780.2
Tab-Delimited Format
# Multi-sensor readings
timestamp sensor_id energy power temp
2024-01-15T10:00:00Z BLDG_A_01 1234.5 850.2 22.5
Data Output Format
All processed data is converted to a standardized sensor reading format:
{
"sensor_id": "SENSOR_001",
"timestamp": 1705312800,
"datetime": "2024-01-15T10:00:00",
"value": 1234.5,
"unit": "kWh",
"value_type": "energy_kwh",
"additional_values": {
"power_w": {"value": 850.2, "unit": "W"},
"voltage_v": {"value": 230.1, "unit": "V"}
},
"metadata": {
"Location": "Building A",
"line_number": 2,
"raw_line": "2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1"
},
"processed_at": "2024-01-15T10:01:23.456789",
"data_source": "slg_v2",
"file_format": "SA4CPS_SLG_V2"
}
Redis Topics
sa4cps_energy_data
Primary energy consumption and power readings:
- Energy consumption (kWh, MWh)
- Power readings (W, kW, MW)
- Efficiency metrics
sa4cps_sensor_metrics
Sensor telemetry and environmental data:
- Voltage/Current readings
- Temperature measurements
- Sensor status/diagnostics
- System health metrics
sa4cps_raw_data
Raw unprocessed data for debugging:
- Original file content
- Processing metadata
- Error information
- Quality metrics
Monitoring & Troubleshooting
Check Processing Status
# View recent processing activity
curl http://localhost:8008/processing/status | jq
# Check data quality metrics
curl http://localhost:8008/data-quality | jq
# Monitor Redis topic activity
curl http://localhost:8008/redis/topics | jq
View Logs
# Service logs
docker-compose logs -f data-ingestion-service
# Follow specific log patterns
docker-compose logs data-ingestion-service | grep "SA4CPS\|SLG_V2"
Common Issues
-
FTP Connection Failed
- Verify
FTP_SA4CPS_HOSTis accessible - Check firewall/network settings
- Validate username/password if not using anonymous
- Verify
-
No Files Found
- Confirm
.slg_v2files exist in the remote path - Check
FTP_SA4CPS_REMOTE_PATHconfiguration - Verify file permissions
- Confirm
-
Processing Errors
- Check data format matches expected
.slg_v2structure - Verify timestamp formats are supported
- Review file content for parsing issues
- Check data format matches expected
Development
Testing
# Run .slg_v2 format tests
cd data-ingestion-service
python test_slg_v2.py
# Test SA4CPS configuration
python sa4cps_config.py
Extending File Support
To add support for new file formats:
- Add format to
DataFormatenum inmodels.py - Implement
_process_your_format_data()indata_processor.py - Add format handling to
process_time_series_data()method - Update
supported_formatslist
Custom Processing Logic
Override processing methods in DataProcessor:
class CustomSA4CPSProcessor(DataProcessor):
async def _process_slg_v2_line(self, line, header, metadata, line_idx):
# Custom line processing logic
processed = await super()._process_slg_v2_line(line, header, metadata, line_idx)
# Add custom fields
processed['custom_field'] = 'custom_value'
return processed
Support
For issues or questions:
- Check service logs:
docker-compose logs data-ingestion-service - Verify configuration:
curl http://localhost:8008/sources - Test FTP connection:
curl -X POST http://localhost:8008/sources/{id}/test - Review processing status:
curl http://localhost:8008/processing/status
License
This implementation is part of the SA4CPS project energy monitoring dashboard.