Comparing analytics module features and performance in ac-2019 vs ac-2023

Our team is currently planning an upgrade from ac-2019 analytics module to ac-2023 and I wanted to gather insights from the community about the practical differences. We’re particularly interested in understanding improvements in data processing speed, enhanced reporting capabilities, and any new integration options that have been introduced.

Our current setup processes about 2TB of daily log data using MaxCompute with the ac-2019 analytics module. We’ve been generally satisfied with the performance, but we’re looking to improve our batch processing times and expand our real-time analytics capabilities. I’ve read the official documentation but would love to hear real-world experiences about what actually changed in practice. Has anyone made this upgrade journey and can share their observations?

From a reporting perspective, ac-2023 introduced much better integration with DataWorks and Quick BI. The reporting capabilities are substantially enhanced - you get native support for incremental materialized views, which dramatically speeds up dashboard refresh times. We also found that the new Spark engine integration in ac-2023 provides better performance for iterative machine learning workloads compared to the ac-2019 version. One thing to note: the upgrade does require updating your SQL syntax in a few places where the optimizer behavior changed.

We completed this upgrade about six months ago and saw significant improvements. The most notable change in ac-2023 is the enhanced query optimizer in MaxCompute. Our complex JOIN operations that used to take 15-20 minutes now complete in 8-12 minutes on average. The data processing speed improvement is real and measurable, especially for multi-table aggregations.

The Spark integration in ac-2023 is definitely a game-changer for hybrid batch-streaming architectures. You can now run Spark Streaming jobs directly within the analytics module ecosystem, sharing the same data sources as your MaxCompute batch jobs. The integration options include native connectors for Kafka, DataHub, and Log Service. We’ve built a lambda architecture where Spark Streaming handles real-time aggregations while MaxCompute processes the batch layer, and ac-2023 makes this much more seamless than it was in ac-2019. The unified metadata management across both engines is particularly valuable.