Photo by Luke Chesser on Unsplash
๐ง๐ต๐ฒ ๐ ๐ฒ๐๐ฟ๐ถ๐ฐ ๐๐ต๐ฎ๐ ๐ ๐ฎ๐๐๐ฒ๐ฟ๐: Request Queue Time
Let's start by defining request queue time. It's basically how long a request has to "wait" before actual processing begins. This waiting usually happens at your application server after it has passed the reverse proxies and LBs and whatnot.
To understand why this happens, you can read this
๐ช๐ต๐ ๐ถ๐ ๐ถ๐ ๐ถ๐บ๐ฝ๐ผ๐ฟ๐๐ฎ๐ป๐?
Your p99 response time can be 200ms (yay), but if your queue time is 2s, the user will complain as he will receive the response after 2+0.2=2.2s. Quite simply, queue time shows the user impact of your server capacity.
Now the question is, ๐ต๐ผ๐ ๐ฑ๐ผ ๐๐ผ๐ ๐บ๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐ถ๐?
Most tools that measure request queue time use the X-Request-Start header added by the load balancer. Basically, you can get the time it crossed your LB to the time it got accepted for processing. Heroku adds this header automatically, AWS ELB on the other hand does not. So, if you are using ELB you will need to add this in your Ngnix or Ingress(or equivalent web server/reverse proxy).
For Ngnix it looks something like
๐๐๐๐ก๐ข_๐๐๐_๐๐๐๐๐๐โ๐ก-๐๐๐๐๐๐๐-๐๐๐๐๐โ"๐=${๐๐๐๐}";
After this, most APMs will automatically start tracking your queue time.
If your queue times are high, it's usually due to an under-provisioned infrastructure or fewer workers/threads. If it is low(< 20ms) you might be able to scale down a bit and save some $$.
I have shared how and where you can check it in NewRelic below: