To begin with, let us emphasize some of the aspects of setting up the performance test environment. It is important to understand how the application is intended to be set up in terms of technical architecture, deployment architecture and from where and how end users will be accessing the applications. This understanding will help in setting up the load simulation infrastructure from the cloud, whether port opening needs to be made, from which all regions the load simulation needs to happen.
Once test setup is complete from simulation perspective, perform sanity runs to ensure loads are injected, monitoring systems can capture the server-side metrics, logs are accessible. It is important to ensure that right logging modes are set up on the application side – info mode during performance testing & debug mode during troubleshooting – as debug mode tends to generate more logs that would serve as an overhead on performance. It is also to be noted that the right kind of monitoring – intrusive/non-intrusive solution – should be put in use depending upon the project needs, post assessing the pros and cons of such monitoring solutions.
Once all parts of the load simulation and monitoring solutions are working fine, sanity run confirms good, look at running small scale load tests to ensure the tests runs for the desired duration, it can simulate the desired behaviour and clear indicators/inference are being able to be made from the peak user load tests based on the workload models defined for the tests. This would also require one to ensure that there is adequate data needed for the load tests are available. Accuracy of data and correlation of parameters between different requests fired is also important. For instance, in an order management application, if 1000 orders are fired and only 400 orders are successfully created, that would not create enough load for the downstream processing.
Based on the workload model defined involving transactions, wait times, anticipated response times, arrive at the overall transaction/request count for the given number of users. Once the test is done, validate the test results like number of transactions completed during the test duration apart from client-side response metrics like response time, 90% response times, error % etc. This will give a fair idea if you are able to achieve the desired transaction count and throughput based on the workload model defined.
The client-side metrics tend to give you performance from end-user perspective, but that indicates the client side experiences alone. You will need to look beyond them to understand where performance issues are arising from. You will need to focus on the server-side metrics like server utilization, memory utilization across web, application and database server side.
A key trait to develop in performance test execution and monitoring is to run test, observe the client side and server metrics, dig down into logs/traces to the last method call level with solutions like AppDynamics/New Relic whenever response times are high or when the utilization of resources go up to the thresholds. Observability is an often-overseen skill that can add real value in spotting problems, going to the root of those problems. It is important to repeat these tests until root causes are sighted and analysed. Often, it is good to start with a low base – for instance, if the objective is to carry out the load tests for 1000 concurrent users, it would be better to start with 250 users and increment it to reach the destined level. Collaboration with developers, infrastructure support teams, ability to engage with architects will be a skill to be honed for performance test teams.
Post the peak user load tests are done and performance test results are in line with expected SLAs, the focus could shift on to running endurance tests that can help ensure memory leaks are unearthed during long run test durations. Memory leaks can lead to undesirable user experiences such as performance degradation, outages, etc. Typically, endurance tests have similar workload but run for a prolonged duration. Often, memory leaks may not be avoidable especially in the browser-based applications as there are a lot of inefficiencies in clearing memory from the browser perspective – in such cases, restarting the browser once in 2-3 days may be a workaround (undesirable, but unavoidable at times).
One of the aspects of a performance test is to focus on performance of current or anticipated usage but also to look at identifying the point of failure or the point at which the performance begins to degrade. Stress tests can be great means that can lead to identifying these breakage points. Gauging performance test results can really help decide product go-live, no-go decisions.
To summarize, in today’s modern cloud-based architecture, performance of an application is often overlooked because of the infinite amount of processing capability one can bring in, often in the name of elastic cloud – often this comes with an additional cost and running a production infrastructure with an inefficient, high-cost infrastructure could often decide between the success and failure of the business. It is very important that load, stress and endurance tests (sometimes even volume tests based on the business requirement) are carried out before a large-scale cloud implementation is carried out.