Best load testing approach for release planning module in ra

We’re setting up comprehensive load testing for our Rally 2024 release planning module and want to share our approach while getting community feedback. Our challenge is creating realistic test scenarios that match actual user behavior during PI planning sessions.

We’re using JMeter combined with Rally’s WSAPI but struggling with the gap between synthetic test data and production usage patterns. Key areas we’re focusing on:


// Sample test scenario structure
Thread Group: Release Planning Users (n=75)
  - Login and navigate to release view
  - Load feature hierarchy with dependencies
  - Update feature estimates and assignments
  - Generate dependency reports
  Think time: 8-15 seconds between actions

Our main questions: How do you capture real user interaction patterns? What’s the best way to model think times accurately? How do you handle complex dependency graphing under load? And what monitoring approach works best - we’re considering ElasticSearch integration for real-time metrics.

Would love to hear how others approach performance testing for release planning workflows, especially during peak PI planning periods when 100+ users hit the system simultaneously.

Your test scenario structure looks solid. One suggestion - add scenarios for concurrent edits to the same release or features. We discovered race conditions during PI planning when multiple product owners updated the same items simultaneously. This wasn’t caught in our initial load tests because we used isolated test data per virtual user.

We track WSAPI response times, database query durations, and application server thread pool utilization in ElasticSearch. The key is correlating these backend metrics with user-facing response times from JMeter. This helps identify whether slowdowns are from API layer, database, or client-side rendering. For think time modeling, analyzing server access logs gives you actual pause patterns between user actions.

Good point on dependency depths. We’re currently testing with full hierarchies which might not reflect typical usage. How do you balance testing worst-case scenarios versus average user patterns? Also, what specific ElasticSearch metrics do you track during load tests?