This was first published on the blog at my other business: Objectivity
A while ago, we needed to run a significant number of performance tests for a BI application written on the Microsoft stack. The non functional requirements had been a bit neglected during the early stages of the project and when there was a change in the clients business it sharpened our focus and the number of concurrent users was to increase from c100 to c1200. Also some new key processes were coming into scope. Whilst the functionality remained the same, performance needed to be very quick as a dedicated desktop application was being decommissioned and the results provided by the BI system instead…
We needed to create certainty that some key reports would give sub second response at the same time as 1200 users were using the system in the normal way.
So what did we do?
Firstly we gathered log stats on the dying and existing systems so that we knew the usage patterns. Then we ran workshops with users to understand how the usage was going to increase after the business change. Finally, we created automated scenarios using Jmeter and designed the Perfmon metric collection so we could dissect the situation from a user and a hardware / software stack point of view.
Previous experience had shown us that the volume of data thrown out by these large scale runs is huge and if you can’t get a simultaneous visualisation of the Jmeter and Perfmon stats then analysis, diagnosis and action planning is frustratingly difficult and slow.
The first runs got to about 500 concurrent users before the system became unstable and by closely inspecting the stats at the point of inflection we created our first list of action items. Then followed a series of iterations where we tuned the software, tweaked the configuration and added more hardware until 1200 concurrent users could successfully pass all the non functional at the same time as the key reports could be run with sub second responses.
Of course it was a bit fraught along the way; My description above doesn’t really convey the lows and highs of the team when planned improvements didn’t yield the success we’d hoped for or small tweaks created a massive effect.
I was reminded once more of the difficulty of predicting the behaviour of complex systems and was grateful for lessons I learned 25 years ago from The Goal (Goldratt and Cox) about the nature of systems that are constrained.
Even though we’d run large scale performance testing before, one of the things that still managed to surprise us was the amount of data thrown out of the measurement framework. The length of time needed to do the analysis / diagnosis was a huge problem to everyone.
In fact, we then decided to look at tooling to see how we could speed up the process and evaluated a number of off the shelf tools. In short we found those to be too simplistic or too expensive with a steep learning curve. We created a lightweight framework with all data stored in a repository and the visualisation achieved through a customised Excel UI. But that’s another story!
If you’d like to know more about the journey we took to make this large BI System perform with a huge increase in the number of users and a tough sharpening of the necessary response times, please give us a call.