Best practices for testing supply planning integrations in hybrid cloud deployments

I’m looking for insights on testing strategies for supply planning integrations in hybrid environments. We’re running Oracle Supply Planning Cloud 23c integrated with on-premise ERP for supply orders and inventory data.

Our current testing approach is fairly basic - we validate data mappings in our test environment and run a few manual scenarios before promoting to production. But we’ve had several production incidents where integration failures caused planning runs to use incomplete data, resulting in incorrect supply recommendations.

I’m particularly interested in how others handle data consistency validation across cloud and on-premise systems, and how to test error scenarios and rollback procedures without disrupting production. What testing frameworks or methodologies have worked well for supply planning integrations? Are there specific tools or Oracle features that help with integration testing in hybrid architectures?

We use Oracle’s Integration Cloud monitoring APIs to build automated integration tests. Our framework runs hourly in test environment, validates data flows, and alerts if any integration endpoints fail. For supply planning specifically, we compare record counts and key metrics between source and target systems to catch data loss. We also maintain a test data set with known edge cases - null values, duplicates, invalid references - to ensure error handling works correctly.

From a planning perspective, the most critical test is data completeness. We’ve had cases where integration ran successfully but filtered out certain records due to mapping issues. The planning run completed but with 15% less data than expected, causing serious supply shortages. Now we validate not just that integration succeeds, but that record counts and data volumes match expected ranges. We also test planning runs with the integrated data to ensure the actual planning algorithms work correctly with the data format.

The chaos engineering approach is interesting but seems risky even in test environment. How do you prevent those injected failures from cascading to other systems? Our test and production environments share some infrastructure, so I’m concerned about accidentally impacting production during testing. Also, the data completeness validation point is excellent - we definitely need better metrics around expected vs actual data volumes rather than just success/failure status.