Friday, September 19, 2014

Visualizing a Processing Queue

As many of you know, I have a thing for fancy charts. It's been a while since I've posted anything, so I thought I'd take some time to share a new fancy chart I've come up with to view the processing time of queues.

The Gist

  • Every point in the graph represents a job processed by some queue. 
  • The y axis shows the total time the job was in the queue, including processing time
  • The x axis shows the time that the job completed
  • The red line trailing on the back of the point represents the time that job waited before being processed
  • The green line trailing on the back of the point represents the processing time for that job


Why


We have many queues where I work. Some application will throw a job into a queue, and another application will pick up that job and process it. Usually these queues are implemented as tables in a SQL database, and they almost always have these columns in common:

  • Insertion Time (What time did we add the job to the queue)
  • Start Time (What time did the application start the job?)
  • End Time (What time did the application finished the job?)
  • Success (Did the application finish successfully?)
Whenever there's a problem, the question "Why is it going slow" inevitably ends up on someone's shoulders. Just looking at the times in our logs doesn't always give you a good answer. Ultimately, there are two things that need to be considered when jobs start to slow down.
  1. How long are jobs taking to process? (EndTime-StartTime)
  2. How long are jobs waiting before processing starts? (StartTime-InsertionTime)
My new graph answers these questions perfectly!

An extremely small how to

Let's say I've got 3 different jobs that processed. Job 1,2, and 3 were inserted at time 1,2, and 3 respectively. For simplicity, I'm going to show times as integers. Here's the log:

JobInsertionTime StartTime StopTime Success Duration
111312
223503
335714

Here, duration is that Total Time in the processing queue (EndTime - InsertiontTime).

Let's plot the Duration and StopTime together:

This gives me this:

Now lets add on the line segments:

This gives me my final output:

And that's it!

Conclusion

This has proven to be extremely valuable. With this graph, I can quickly determine whether an application is genuinely taking too long, or if we just got hit with a bunch of input, and we need to start up new instances of said application. Ultimately, I can now run a quick R script and have a great picture of what's happening to our system as a whole.