Automated SFTP file import fails due to schema mismatch in process automation

Automated SFTP file import fails because the incoming CSV file structure doesn’t match our expected schema. The vendor occasionally adds or reorders columns without notice, breaking our import process.

Current import logic is rigid:


FOR EACH row IN csvFile:
  order.CustomerID = row[0];
  order.Amount = row[1];
  order.Date = row[2];
END FOR;

Need CSV schema validation before processing, better vendor communication protocols, and flexible import logic that handles column variations without failing completely.

Header-based parsing makes sense. But what if the vendor adds new columns we don’t need? Should we ignore them, or does that indicate a schema change we should investigate?

Ignore extra columns gracefully - that’s not a breaking change. Only fail if required columns are missing or renamed. Log a warning when new columns appear so you can investigate if they’re relevant. Also implement column mapping configuration so you can handle vendor schema changes without code deployment - just update the mapping table.

Position-based column access (row[0], row[1]) is the problem. Switch to header-based parsing. Read the first row as column names, then access by name: row[‘CustomerID’], row[‘Amount’]. This survives column reordering. Add validation to check required columns exist before processing any data.

On the vendor communication side, establish a formal change notification process. Add it to your SLA: 2 weeks notice for schema changes, sample files for testing, and schema documentation updates. We also request vendors send files to a staging SFTP first, where we validate them before moving to production processing.

Implement a pre-processing validation step. Before importing any rows, parse the CSV header and compare against your expected schema. If columns are missing or have wrong names, reject the file immediately and alert operations. Log the actual vs expected schema for troubleshooting. We catch 90% of vendor file issues this way before they impact production data.

Here’s a complete solution covering CSV schema validation, vendor communication, and flexible import logic:

1. Header-Based CSV Parsing: Replace position-based access with dynamic header parsing:


// Pseudocode - Flexible CSV parsing:
1. Read first line of CSV as header row
2. Parse header into column name array
3. Create column index map: {"CustomerID": 0, "Amount": 1, "Date": 2}
4. For each data row, access by name:
   order.CustomerID = GetColumnValue(row, "CustomerID", columnMap);
   order.Amount = GetColumnValue(row, "Amount", columnMap);
5. Handle missing columns with default values or errors

2. Pre-Import Schema Validation: Implement comprehensive validation before processing:


FUNCTION ValidateCSVSchema(csvFile, expectedSchema):
  header = ReadFirstLine(csvFile);
  actualColumns = ParseHeader(header);

  // Check required columns
  FOR EACH requiredCol IN expectedSchema.RequiredColumns:
    IF NOT actualColumns.Contains(requiredCol) THEN
      RETURN ValidationResult.Failed("Missing column: " + requiredCol);
    END IF;
  END FOR;

  // Log extra columns (warning, not error)
  extraCols = actualColumns.Except(expectedSchema.AllColumns);
  IF extraCols.Count > 0 THEN
    LogWarning("New columns detected: " + extraCols.ToString());
  END IF;

  RETURN ValidationResult.Success();
END FUNCTION;

3. Flexible Import Logic: Build adapter pattern for schema variations:


// Pseudocode - Adaptive column mapping:
1. Load column mapping configuration from database
2. Support aliases: "CustomerID" OR "Customer_ID" OR "CustID"
3. Try each alias until match found in CSV header
4. Apply data type conversions based on target field
5. Handle optional fields with null/default value logic
6. Validate data format before inserting to database

4. Column Mapping Configuration: Create flexible mapping entity:


CSV_ColumnMapping:
- field_name (target field in OutSystems)
- csv_column_primary (expected header name)
- csv_column_aliases (alternate names, comma-separated)
- is_required (boolean)
- default_value (if missing and optional)
- data_type, validation_regex
- vendor_id, schema_version

5. Schema Validation Rules:


ValidationRule Entity:
- rule_type (REQUIRED_COLUMN, DATA_TYPE, FORMAT, RANGE)
- field_name, validation_expression
- error_message, severity (ERROR/WARNING)
- is_active, schema_version

Example rules:
- CustomerID: required, numeric, length=8
- Amount: required, decimal, >0
- Date: required, format=YYYY-MM-DD
- Email: optional, regex=[a-z0-9@.]+

6. Pre-Production Validation Stage: Implement two-tier SFTP processing:


// Pseudocode - Staging validation workflow:
1. Vendor uploads file to /sftp/staging directory
2. Timer (every 15 min) scans staging for new files
3. Run comprehensive validation:
   - Schema structure check
   - Data type validation
   - Business rule validation (totals, relationships)
   - Row count and file size checks
4. If validation passes:
   - Move file to /sftp/production
   - Trigger import process
5. If validation fails:
   - Move file to /sftp/failed with error report
   - Email vendor and operations team
   - Do NOT process any rows

7. Vendor Communication Protocol:

Formal SLA Requirements:

  • 14 days advance notice for schema changes
  • Updated schema documentation (field names, types, formats)
  • Sample files with new schema for testing
  • Backward compatibility for 30 days during transition
  • Contact person for schema-related issues

Change Notification Process:


Schema_ChangeLog:
- change_id, vendor_id, notification_date
- effective_date, schema_version
- changes_description (added/removed/renamed columns)
- sample_file_provided, testing_completed
- production_deployment_date
- impact_assessment, rollback_plan

8. Schema Versioning System: Support multiple schema versions simultaneously:


// Pseudocode - Version detection and routing:
1. Read CSV header and calculate schema fingerprint (hash)
2. Query schema registry for matching version
3. If no match, attempt fuzzy matching (90% column overlap)
4. Load version-specific parsing configuration
5. Apply appropriate transformation logic
6. Tag imported records with schema_version for audit
7. Alert if deprecated schema version detected

9. Error Handling and Recovery:


IF (SchemaValidationFails) THEN
  // Generate detailed error report
  report = BuildErrorReport(expectedSchema, actualSchema);

  // Email stakeholders
  SendAlert("Schema Mismatch", report, vendorContact, opsTeam);

  // Move file to quarantine
  MoveFile(csvFile, "/sftp/quarantine/" + timestamp);

  // Create incident ticket
  CreateTicket("Schema Validation Failed", report, Priority.High);

  // Do not process any data
  RETURN ProcessResult.Failed;
END IF;

10. Monitoring Dashboard: Track import health metrics:

  • Files processed vs failed (last 30 days)
  • Schema validation failure reasons (top 10)
  • Average processing time per file
  • Vendor compliance with notification SLA
  • Schema version distribution
  • Column mapping hit/miss rates

11. Data Quality Validation: Beyond schema, validate data quality:


// Pseudocode - Business rule validation:
1. Check for duplicate records (key field uniqueness)
2. Validate referential integrity (foreign keys exist)
3. Verify numeric ranges (amounts >0, dates reasonable)
4. Check required field population (not just schema, but actual values)
5. Validate cross-field logic (discount < subtotal)
6. Compare row count against expected volume (±20% tolerance)

12. Flexible Import Adapter:


FUNCTION ImportCSVRow(row, columnMap, validationRules):
  entity = CreateNewEntity();

  FOR EACH field IN targetSchema:
    // Try primary column name
    value = TryGetValue(row, field.PrimaryName, columnMap);

    // Try aliases if primary not found
    IF value IS NULL THEN
      FOR EACH alias IN field.Aliases:
        value = TryGetValue(row, alias, columnMap);
        IF value IS NOT NULL THEN BREAK;
      END FOR;
    END IF;

    // Apply default or fail if required
    IF value IS NULL THEN
      IF field.IsRequired THEN
        THROW ValidationError("Missing required field: " + field.Name);
      ELSE
        value = field.DefaultValue;
      END IF;
    END IF;

    // Convert and validate
    convertedValue = ConvertDataType(value, field.DataType);
    ValidateFormat(convertedValue, field.ValidationRegex);

    // Assign to entity
    SetFieldValue(entity, field.Name, convertedValue);
  END FOR;

  RETURN entity;
END FUNCTION;

13. Testing Strategy: Before production deployment:

  • Test with vendor’s sample files (new schema)
  • Test with historical files (old schema compatibility)
  • Test with malformed files (error handling)
  • Test with edge cases: empty files, huge files, special characters
  • Load test: 10,000+ row files
  • Concurrent processing: multiple files simultaneously

14. Rollback Plan: If new schema causes issues:

  • Keep old schema parsing logic active
  • Route problematic files to manual processing queue
  • Revert column mapping configuration
  • Notify vendor of compatibility issues
  • Document required fixes for next deployment

After implementing this framework, our SFTP import success rate improved from 85% to 99.5%. Schema-related failures dropped by 95%, and when vendor schema changes occur, we detect them in staging before impacting production. The flexible column mapping handles minor variations automatically, and the vendor communication protocol gives us advance warning for major changes.