How much traffic can your site handle?

Do you know?

Whenever you need to take care of a website, one of the questions that keeps coming back is how capable the site is. So what you will usually do is, fire up the load testing tool and keep hitting the site until it stopped working. Then, go back to the boss and said "Our site can handle this amount of page requests".

Now you know the truth?

The problem of this approach is, it would bring down the site. What's more, the value is only representable in certain scenario. It is good for determining how well the site can deal with the "Slashdot Effect", which one of your URL is being shared wildly. But in reality, the visiting pattern is more random and you may want to reproduce a load test, base on the access pattern.

What is a good visiting pattern?

First thing to be aware of, is to focus on a flow, instead of individual URL. The user lands on the homepage, browse around the category page, before reaching the article of interest. Imagine you have hundreds of people browsing your site, going through their own flow, URL requests are all intervene.

How could you get a sense of the browsing pattern? Google Analytics can give you some very good insight if you have the tracking code in place. In case if you don't, you can go to visit your good old friend - web server log. But mind you up front, you will need a bit of patience to get it through. With session id, or any unique identifier that allows you to track the individual, you can then parse your log files by the principle of,

URI(s) group by session_id order by time ascending

From there, you would be seeing multiple of aggregated URL visit pattern and more likely than not, you cannot spot anything meaningful at all. The number of combinations are just too much to be useful and thus, go back to the log and do some "ETL". Tag URL of the same nature into an alias, repeat the analysis, so that you can see some useful pattern. Splunk allows you to tag pattern and perform the analysis easily.

One approach that I tempted to try but never implemented is to replay the Apache access log via JMeter. The flip side of this is, it resembles the production traffic but on the other hand, it is reflecting the pattern of the particular day only.

Load Time? Almost!

After identifying a pattern, or a distribution of site traffic. You may want to obtain one more piece of information - the user agent’s population.

My company’s web site has two version - desktop and responsive. The page structure is very different, in terms of both presentation and code. We determine which site should be served to the client by their user agent. I use JMeter for load testing, it is easy to setup and plentiful of resources are available to help you. For my setup, I have created a total of 3 thread pools to mimic the traffic. One is for mimicing desktop page requests, the other one for mimicing responsive site page requests and I will talk about the last one later in the article.

The test setup is indeed quite straight forward. But one technique that I would like to share, is the random effect. One of the common visiting pattern for my company’s web site is like,

Go the the “New Arrival” page
Scroll and click on a product that brings them to the product details page

Thus, the first JMeter script I write is to mimic this behavior. The tricky part of the script is to randomly pick an item programatically. Since our product ID is an attribute’s value of a CSS element, and there are multiple of such in the New Arrival page. So, the JMeter script will randomly pick one of the element and parsed the attribute value out of it.

Extract the attribute value from the CSS Element

And we can use the attribute value to substitue the variable defined in the JMeter script.

With the JMeter test plan ready, I can now determine how much traffic my site can handle. Start off with a small amount of concurrent requests and pumps up gradually. This allows us to observe for any abnormality and one factor I will look at, is the GC behavior. With the load, are JVM performing GC more frequently? Is the heap memory stacking up? Data is much easier to obtain with monitoring agent like New Relic. If all the conditions look healthy, I will then start pumping up the amount of request and conduct a real load testing.

How do I know what is the user experience at loading time?

The JMeter test result, together with server metrics (e.g. GC as spoken above) provides a view on how the server perform. But we do not know how the end user experience is like. Thus, let’s bring in the 3rd thread pool I defined in the JMeter script!

The role of this pool is to invoke a browser and obtain a waterfall diagram by visiting pages of interest. Since I have a private WebPageTest instance and thus, I can invoke a request to the page of interest through the WPT API. Apart from the page of interest, we can also gather the performance information of different user agents, too. It's easier to perceive how each components work together with a diagram.

Analysis

During the JMeter load test, we have collected information from various system. Below is a list of metrics that we collect, - JMeter - Successful requests count - Error requests count - Average response time - New Relic - JVM Performance Apdex - CPU usage - Database - AWR report - WebPageTest - Average page load time

Instead of looking into the metrics individually, we have developed a portal page which render all these results in a single location. We are able to look at historical data, too.

Final Words

We begin the project sole for implementing JMeter scripts for conducting load test. But it slowly evolves into an application that is not only firing JMeter script but looking at the website from both server and client side perspective. And we also notice that the idea has been developed as a commercial offering by BlazeMeter. You may want to check it out if you don’t want to build your own infrastructure.

This is it! Heads off to construct your test and prepare for the holiday season!