๐—ง๐—ต๐—ฒ ๐— ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ ๐˜๐—ต๐—ฎ๐˜ ๐— ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐˜€: Request Queue Time

ยท

2 min read

Let's start by defining request queue time. It's basically how long a request has to "wait" before actual processing begins. This waiting usually happens at your application server after it has passed the reverse proxies and LBs and whatnot.

To understand why this happens, you can read this

๐—ช๐—ต๐˜† ๐—ถ๐˜€ ๐—ถ๐˜ ๐—ถ๐—บ๐—ฝ๐—ผ๐—ฟ๐˜๐—ฎ๐—ป๐˜?

Your p99 response time can be 200ms (yay), but if your queue time is 2s, the user will complain as he will receive the response after 2+0.2=2.2s. Quite simply, queue time shows the user impact of your server capacity.

Now the question is, ๐—ต๐—ผ๐˜„ ๐—ฑ๐—ผ ๐˜†๐—ผ๐˜‚ ๐—บ๐—ฒ๐—ฎ๐˜€๐˜‚๐—ฟ๐—ฒ ๐—ถ๐˜?

Most tools that measure request queue time use the X-Request-Start header added by the load balancer. Basically, you can get the time it crossed your LB to the time it got accepted for processing. Heroku adds this header automatically, AWS ELB on the other hand does not. So, if you are using ELB you will need to add this in your Ngnix or Ingress(or equivalent web server/reverse proxy).

For Ngnix it looks something like

๐š™๐š›๐š˜๐šก๐šข_๐šœ๐šŽ๐š_๐š‘๐šŽ๐šŠ๐š๐šŽ๐š›โ€‚๐šก-๐š›๐šŽ๐šš๐šž๐šŽ๐šœ๐š-๐šœ๐š๐šŠ๐š›๐šโ€‚"๐š=${๐š–๐šœ๐šŽ๐šŒ}";

After this, most APMs will automatically start tracking your queue time.

If your queue times are high, it's usually due to an under-provisioned infrastructure or fewer workers/threads. If it is low(< 20ms) you might be able to scale down a bit and save some $$.

I have shared how and where you can check it in NewRelic below:

ย