Data storage REST API timeouts during large file uploads in c8y-1020

We’re experiencing REST API timeouts when uploading large diagnostic files (50-200MB) from edge devices. The API times out after 60 seconds, leaving uploads incomplete. We’re using standard POST requests to the binary endpoint. We haven’t implemented chunked upload or explored the multipart endpoint option.


POST /inventory/binaries
File sizes: 50-200MB
Timeout: 60 seconds (default)
Success rate: 15% for files >100MB

The incomplete uploads are causing issues with our diagnostic workflow. What’s the recommended approach for handling large file transfers reliably?

60 seconds is way too short for 100MB+ files over typical IoT connections. You need to increase the client-side timeout first. But more importantly, you should be using chunked uploads for anything over 10MB. Single POST requests aren’t designed for large file transfers.

Don’t forget to implement progress tracking. Store the upload state (which chunks completed) so if the entire process fails, you can resume from where you left off. We maintain a local state file that tracks chunk status, allowing us to resume even after application restart.

Thanks, I’ll look into multipart uploads. What’s a good chunk size? And how do we handle the case where a chunk fails - do we retry just that chunk or restart the entire upload?

The multipart upload endpoint is specifically designed for this use case. You break the file into chunks, upload each chunk separately, and then finalize the upload. This approach is resilient to network interruptions and allows you to resume failed uploads without starting over.

Here’s a comprehensive solution addressing all three key areas:

API Timeout Configuration: First, increase your client timeout to accommodate large transfers, but don’t rely on this alone:


httpClient.setTimeout(300000); // 5 minutes
httpClient.setReadTimeout(300000);
httpClient.setConnectionTimeout(30000);

However, long timeouts aren’t the real solution - chunked uploads are.

Chunked Upload Implementation: Use the multipart upload workflow for files over 10MB:


// Pseudocode - Chunked upload process:
1. Calculate total chunks: chunks = fileSize / chunkSize (5MB)
2. Initiate multipart upload:
   POST /inventory/binaries/multipart
   Returns: uploadId
3. For each chunk (parallel uploads possible):
   PUT /inventory/binaries/multipart/{uploadId}/{chunkNumber}
   Include: Content-Range header
   Store: ETag from response for verification
4. Complete upload:
   POST /inventory/binaries/multipart/{uploadId}/complete
   Include: Array of chunk ETags in order
5. Verify: Check final file integrity

Multipart Endpoint Usage: Implement robust multipart upload with retry logic:


// Initiate upload
POST /inventory/binaries/multipart
Content-Type: application/json
{
  "name": "diagnostic_log.zip",
  "type": "application/zip"
}

Detailed Implementation Strategy:

  1. Chunk Size Selection:

    • Use 5MB chunks for optimal balance
    • Adjust based on network quality (2MB for poor connections)
    • Never exceed 10MB per chunk
  2. Parallel Upload Optimization:

    • Upload 3-5 chunks concurrently
    • Monitor bandwidth to avoid saturation
    • Implement adaptive concurrency based on success rate
  3. Retry Logic Per Chunk:

    • Each chunk gets 3 retry attempts
    • Exponential backoff: 2s, 4s, 8s
    • Track failed chunks separately for final retry pass
  4. Progress Persistence: Store upload state locally:

    
    {
      "uploadId": "abc123",
      "fileName": "diagnostic.zip",
      "totalChunks": 40,
      "completedChunks": [1,2,3,5,7],
      "chunkETags": {"1": "etag1", "2": "etag2"},
      "failedChunks": [4,6]
    }
    
  5. Resume Capability:

    • On application restart, check for incomplete uploads
    • Resume from last completed chunk
    • Verify completed chunks with HEAD requests
  6. Compression Strategy:

    • Compress files before upload when appropriate
    • Use gzip for logs/text (70-80% reduction)
    • Skip compression for already-compressed formats (zip, jpg)
  7. Network Resilience:

    • Implement connection health checks before starting upload
    • Monitor upload speed and adjust chunk size dynamically
    • Pause upload during poor network conditions
  8. Verification:

    • Calculate MD5 hash of original file
    • Verify each chunk ETag after upload
    • Confirm final file integrity after completion

Error Handling:

  • Chunk Upload Failure: Retry chunk up to 3 times, then mark for later retry
  • Network Disconnection: Pause upload, resume when connection restored
  • Timeout During Completion: Verify upload status with GET request before retrying
  • Partial Upload Cleanup: Implement cleanup for abandoned uploads (>24 hours old)

Monitoring and Metrics: Track these key indicators:

  • Upload success rate by file size
  • Average upload time per MB
  • Chunk retry rate
  • Network quality during uploads
  • Bandwidth utilization

Performance Optimization:

  1. Pre-upload Validation:

    • Check available storage quota
    • Verify file format before starting
    • Estimate upload time based on file size and connection speed
  2. Bandwidth Management:

    • Implement rate limiting to avoid network saturation
    • Prioritize critical uploads
    • Schedule large uploads during off-peak hours
  3. Background Processing:

    • Queue uploads for background processing
    • Allow users to continue working during upload
    • Provide real-time progress notifications

Implementing this multipart upload strategy will increase your success rate to 99%+ even for 200MB files over challenging network connections. The key is breaking large transfers into manageable chunks with individual retry logic.

Consider compressing files before upload if they’re diagnostic logs or text-based data. We saw 70-80% size reduction on log files, which dramatically reduced upload time and failure rates. Just make sure to set appropriate content-type headers so the platform knows the file is compressed.