The Gist
- Every point in the graph represents a job processed by some queue.
- The y axis shows the total time the job was in the queue, including processing time
- The x axis shows the time that the job completed
- The red line trailing on the back of the point represents the time that job waited before being processed
- The green line trailing on the back of the point represents the processing time for that job
Why
We have many queues where I work. Some application will throw a job into a queue, and another application will pick up that job and process it. Usually these queues are implemented as tables in a SQL database, and they almost always have these columns in common:
- Insertion Time (What time did we add the job to the queue)
- Start Time (What time did the application start the job?)
- End Time (What time did the application finished the job?)
- Success (Did the application finish successfully?)
Whenever there's a problem, the question "Why is it going slow" inevitably ends up on someone's shoulders. Just looking at the times in our logs doesn't always give you a good answer. Ultimately, there are two things that need to be considered when jobs start to slow down.
- How long are jobs taking to process? (EndTime-StartTime)
- How long are jobs waiting before processing starts? (StartTime-InsertionTime)
My new graph answers these questions perfectly!
An extremely small how to
Let's say I've got 3 different jobs that processed. Job 1,2, and 3 were inserted at time 1,2, and 3 respectively. For simplicity, I'm going to show times as integers. Here's the log:
Job | InsertionTime | StartTime | StopTime | Success | Duration |
1 | 1 | 1 | 3 | 1 | 2 |
2 | 2 | 3 | 5 | 0 | 3 |
3 | 3 | 5 | 7 | 1 | 4 |
Here, duration is that Total Time in the processing queue (EndTime - InsertiontTime).
Let's plot the Duration and StopTime together:
This gives me this:
Now lets add on the line segments:
This gives me my final output:
And that's it!
Let's plot the Duration and StopTime together:
This gives me this:
Now lets add on the line segments:
This gives me my final output:
And that's it!
Conclusion
This has proven to be extremely valuable. With this graph, I can quickly determine whether an application is genuinely taking too long, or if we just got hit with a bunch of input, and we need to start up new instances of said application. Ultimately, I can now run a quick R script and have a great picture of what's happening to our system as a whole.
No comments:
Post a Comment