I’m designing storage architecture for a large-scale IoT deployment with 10,000+ sensors generating telemetry every 30 seconds. We need to support both real-time monitoring and historical analytics for up to 2 years of data.
Debating between timeseries-optimized storage (like InfluxDB or Timescale) versus traditional relational databases (Db2) or NoSQL options (Cloudant). Each approach has trade-offs:
Timeseries databases excel at write performance and time-based queries but can be limiting for complex analytics. Relational databases offer powerful query capabilities and ACID guarantees but may struggle with high-volume sensor data ingestion. NoSQL provides flexibility and scalability but requires careful data modeling for efficient queries.
What has worked well for others managing IoT sensor data at scale? Particularly interested in experiences with query performance for both real-time dashboards and batch analytics, and how well different storage solutions integrate with IBM Cloud analytics services.
For 10K sensors at 30-second intervals, you’re looking at roughly 28M data points per day. Timeseries databases are purpose-built for this workload. We use TimescaleDB on IBM Cloud and it handles our similar scale beautifully. Query performance for time-range queries is excellent, and downsampling features help with long-term storage. The PostgreSQL compatibility means we can still do complex joins when needed for analytics.
We went with a hybrid approach - timeseries database for raw sensor data (optimized for writes and recent data queries) and aggregated summaries pushed to Db2 for complex analytics. This gives us the best of both worlds. Raw data retention is 90 days in timeseries, then we keep only hourly/daily aggregates in Db2 for long-term analysis.
One consideration often overlooked is data modeling for different sensor types. With NoSQL like Cloudant, you get schema flexibility which is valuable when sensor types evolve or new devices are added. However, this flexibility can hurt query performance if not carefully designed. We use Cloudant for configuration and metadata (flexible schema) but timeseries for actual telemetry (rigid, optimized schema). This separation has worked very well for our IoT platform.