Files
sac4cps-backend/microservices/data-ingestion-service/README_SA4CPS.md
rafaeldpsilva 5fdce00e5d Add data-ingestion-service for SA4CPS FTP integration
- Implement FTP monitoring and ingestion for SA4CPS .slg_v2 files - Add
robust data processor with multi-format and unit inference support -
Publish parsed data to Redis topics for real-time dashboard simulation -
Include validation, monitoring, and auto-configuration scripts - Provide
documentation and test scripts for SA4CPS integration
2025-09-10 14:43:30 +01:00

8.1 KiB

SA4CPS FTP Data Ingestion Service

This service monitors the SA4CPS FTP server at ftp.sa4cps.pt and processes .slg_v2 files for real-time energy monitoring data ingestion.

Overview

The Data Ingestion Service provides comprehensive FTP monitoring and data processing capabilities specifically designed for the SA4CPS project. It automatically detects, downloads, and processes .slg_v2 files from the FTP server, converting them into standardized sensor readings for the energy monitoring dashboard.

Architecture

ftp.sa4cps.pt (.slg_v2 files) 
    ↓
FTP Monitor (polls every 5 minutes)
    ↓  
Data Processor (supports multiple formats)
    ↓
Redis Publisher (3 topic channels)
    ↓
Real-time Dashboard & Analytics

Features

FTP Monitoring

  • Automatic Discovery: Monitors ftp.sa4cps.pt for new .slg_v2 files
  • Duplicate Prevention: Tracks processed files to avoid reprocessing
  • Connection Management: Maintains persistent FTP connections with automatic retry
  • File Pattern Matching: Supports *.slg_v2 and custom file patterns
  • Configurable Polling: Default 5-minute intervals, fully configurable

Data Processing

  • Multi-Format Support: CSV-style, space-delimited, tab-delimited .slg_v2 files
  • Smart Header Detection: Automatically detects and parses header information
  • Metadata Extraction: Processes comment lines for file-level metadata
  • Unit Inference: Intelligent unit detection based on column names and value ranges
  • Timestamp Handling: Supports multiple timestamp formats with automatic parsing
  • Multi-Value Support: Handles files with multiple sensor readings per line

Data Output

  • Redis Publishing: Real-time data streaming via Redis pub/sub
  • Multiple Topics: Publishes to 3 specialized channels:
    • sa4cps_energy_data: Energy consumption and power readings
    • sa4cps_sensor_metrics: Sensor telemetry and status data
    • sa4cps_raw_data: Raw unprocessed data for debugging
  • Standardized Format: Consistent sensor reading format across all outputs

Quick Start

1. Deploy with Docker Compose

cd microservices
docker-compose up -d data-ingestion-service

2. Auto-Configure SA4CPS Source

# Run the automatic configuration script
docker-compose exec data-ingestion-service python startup_sa4cps.py

3. Verify Setup

# Check service health
curl http://localhost:8008/health

# View configured data sources
curl http://localhost:8008/sources

# Monitor processing statistics
curl http://localhost:8008/stats

Configuration

Environment Variables

Set these in the docker-compose.yml:

environment:
  - FTP_SA4CPS_HOST=ftp.sa4cps.pt        # FTP server hostname
  - FTP_SA4CPS_PORT=21                   # FTP port (default: 21)
  - FTP_SA4CPS_USERNAME=anonymous        # FTP username
  - FTP_SA4CPS_PASSWORD=                 # FTP password (empty for anonymous)
  - FTP_SA4CPS_REMOTE_PATH=/            # Remote directory path

Manual Configuration

You can also configure the SA4CPS data source programmatically:

from sa4cps_config import SA4CPSConfigurator

configurator = SA4CPSConfigurator()

# Create data source
result = await configurator.create_sa4cps_data_source(
    username="your_username",
    password="your_password", 
    remote_path="/data/energy"
)

# Test connection
test_result = await configurator.test_sa4cps_connection()

# Check status
status = await configurator.get_sa4cps_status()

API Endpoints

Health & Status

  • GET /health - Service health check
  • GET /stats - Processing statistics
  • GET /sources - List all data sources

Data Source Management

  • POST /sources - Create new data source
  • PUT /sources/{id} - Update data source
  • DELETE /sources/{id} - Delete data source
  • POST /sources/{id}/test - Test FTP connection
  • POST /sources/{id}/trigger - Manual processing trigger

Monitoring

  • GET /processing/status - Current processing status
  • GET /data-quality - Data quality metrics
  • GET /redis/topics - Active Redis topics

.slg_v2 File Format Support

The service supports various .slg_v2 file formats:

CSV-Style Format

# SA4CPS Energy Data
# Location: Building A
timestamp,sensor_id,energy_kwh,power_w,voltage_v
2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1
2024-01-15T10:01:00Z,SENSOR_001,1235.1,865.3,229.8

Space-Delimited Format

# Energy consumption data
# System: Smart Grid Monitor
2024-01-15T10:00:00 LAB_A_001 1500.23 750.5
2024-01-15T10:01:00 LAB_A_001 1501.85 780.2

Tab-Delimited Format

# Multi-sensor readings
timestamp	sensor_id	energy	power	temp
2024-01-15T10:00:00Z	BLDG_A_01	1234.5	850.2	22.5

Data Output Format

All processed data is converted to a standardized sensor reading format:

{
  "sensor_id": "SENSOR_001",
  "timestamp": 1705312800,
  "datetime": "2024-01-15T10:00:00",
  "value": 1234.5,
  "unit": "kWh",
  "value_type": "energy_kwh",
  "additional_values": {
    "power_w": {"value": 850.2, "unit": "W"},
    "voltage_v": {"value": 230.1, "unit": "V"}
  },
  "metadata": {
    "Location": "Building A",
    "line_number": 2,
    "raw_line": "2024-01-15T10:00:00Z,SENSOR_001,1234.5,850.2,230.1"
  },
  "processed_at": "2024-01-15T10:01:23.456789",
  "data_source": "slg_v2",
  "file_format": "SA4CPS_SLG_V2"
}

Redis Topics

sa4cps_energy_data

Primary energy consumption and power readings:

  • Energy consumption (kWh, MWh)
  • Power readings (W, kW, MW)
  • Efficiency metrics

sa4cps_sensor_metrics

Sensor telemetry and environmental data:

  • Voltage/Current readings
  • Temperature measurements
  • Sensor status/diagnostics
  • System health metrics

sa4cps_raw_data

Raw unprocessed data for debugging:

  • Original file content
  • Processing metadata
  • Error information
  • Quality metrics

Monitoring & Troubleshooting

Check Processing Status

# View recent processing activity
curl http://localhost:8008/processing/status | jq

# Check data quality metrics
curl http://localhost:8008/data-quality | jq

# Monitor Redis topic activity
curl http://localhost:8008/redis/topics | jq

View Logs

# Service logs
docker-compose logs -f data-ingestion-service

# Follow specific log patterns
docker-compose logs data-ingestion-service | grep "SA4CPS\|SLG_V2"

Common Issues

  1. FTP Connection Failed

    • Verify FTP_SA4CPS_HOST is accessible
    • Check firewall/network settings
    • Validate username/password if not using anonymous
  2. No Files Found

    • Confirm .slg_v2 files exist in the remote path
    • Check FTP_SA4CPS_REMOTE_PATH configuration
    • Verify file permissions
  3. Processing Errors

    • Check data format matches expected .slg_v2 structure
    • Verify timestamp formats are supported
    • Review file content for parsing issues

Development

Testing

# Run .slg_v2 format tests
cd data-ingestion-service
python test_slg_v2.py

# Test SA4CPS configuration
python sa4cps_config.py

Extending File Support

To add support for new file formats:

  1. Add format to DataFormat enum in models.py
  2. Implement _process_your_format_data() in data_processor.py
  3. Add format handling to process_time_series_data() method
  4. Update supported_formats list

Custom Processing Logic

Override processing methods in DataProcessor:

class CustomSA4CPSProcessor(DataProcessor):
    async def _process_slg_v2_line(self, line, header, metadata, line_idx):
        # Custom line processing logic
        processed = await super()._process_slg_v2_line(line, header, metadata, line_idx)
        
        # Add custom fields
        processed['custom_field'] = 'custom_value'
        
        return processed

Support

For issues or questions:

  1. Check service logs: docker-compose logs data-ingestion-service
  2. Verify configuration: curl http://localhost:8008/sources
  3. Test FTP connection: curl -X POST http://localhost:8008/sources/{id}/test
  4. Review processing status: curl http://localhost:8008/processing/status

License

This implementation is part of the SA4CPS project energy monitoring dashboard.