The Gist
- Every point in the graph represents a job processed by some queue.
- The y axis shows the total time the job was in the queue, including processing time
- The x axis shows the time that the job completed
- The red line trailing on the back of the point represents the time that job waited before being processed
- The green line trailing on the back of the point represents the processing time for that job
Why
We have many queues where I work. Some application will throw a job into a queue, and another application will pick up that job and process it. Usually these queues are implemented as tables in a SQL database, and they almost always have these columns in common:
- Insertion Time (What time did we add the job to the queue)
- Start Time (What time did the application start the job?)
- End Time (What time did the application finished the job?)
- Success (Did the application finish successfully?)
Whenever there's a problem, the question "Why is it going slow" inevitably ends up on someone's shoulders. Just looking at the times in our logs doesn't always give you a good answer. Ultimately, there are two things that need to be considered when jobs start to slow down.
- How long are jobs taking to process? (EndTime-StartTime)
- How long are jobs waiting before processing starts? (StartTime-InsertionTime)
My new graph answers these questions perfectly!
An extremely small how to
Let's say I've got 3 different jobs that processed. Job 1,2, and 3 were inserted at time 1,2, and 3 respectively. For simplicity, I'm going to show times as integers. Here's the log:
Job | InsertionTime | StartTime | StopTime | Success | Duration |
1 | 1 | 1 | 3 | 1 | 2 |
2 | 2 | 3 | 5 | 0 | 3 |
3 | 3 | 5 | 7 | 1 | 4 |
Here, duration is that Total Time in the processing queue (EndTime - InsertiontTime).
Let's plot the Duration and StopTime together:
This gives me this:
Now lets add on the line segments:
This gives me my final output:
And that's it!
Let's plot the Duration and StopTime together:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#I'll also color successfull jobs GREEN, and failed jobs RED | |
plot(log$StopTime,log$duration,col=c('red','green')[log$Success+1],ylab="Duration (s)",xlab="Time",xlim=c(1,7)) |
This gives me this:
Now lets add on the line segments:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Draw a RED horizontal line from insertion time to start time | |
segments(log$InsertionTime,log$duration,log$StartTime,log$duration,col="red") | |
#Draw a GREEN horizontal line from start time to stop time | |
segments(log$StartTime,log$duration,log$StopTime,log$duration,col="green") |
And that's it!
Conclusion
This has proven to be extremely valuable. With this graph, I can quickly determine whether an application is genuinely taking too long, or if we just got hit with a bunch of input, and we need to start up new instances of said application. Ultimately, I can now run a quick R script and have a great picture of what's happening to our system as a whole.
No comments:
Post a Comment