As I mentioned in a previous post we use Phusion Passenger as the application server to host our Ruby applications. A while ago upon the recommendation of my coworker Ben Cail I created a cron job that calls passenger-status every 5 minutes to log the status of Passenger in our servers. Below is a sample of the passenger-status output:
Version : 5.1.12 Date : Mon Jul 30 10:42:54 -0400 2018 Instance: 8x6dq9uX (Apache/2.2.15 (Unix) DAV/2 Phusion_Passenger/5.1.12) ----------- General information ----------- Max pool size : 6 App groups : 1 Processes : 6 Requests in top-level queue : 0 ----------- Application groups ----------- /path/to/our/app: App root: /path/to/our/app Requests in queue: 3 * PID: 43810 Sessions: 1 Processed: 20472 Uptime: 1d 7h 31m 25s CPU: 0% Memory : 249M Last used: 1s ag * PID: 2628 Sessions: 1 Processed: 1059 Uptime: 4h 34m 39s CPU: 0% Memory : 138M Last used: 1s ago * PID: 2838 Sessions: 1 Processed: 634 Uptime: 4h 30m 47s CPU: 0% Memory : 134M Last used: 1s ago * PID: 16836 Sessions: 1 Processed: 262 Uptime: 2h 14m 46s CPU: 0% Memory : 160M Last used: 1s ago * PID: 27431 Sessions: 1 Processed: 49 Uptime: 25m 27s CPU: 0% Memory : 119M Last used: 0s ago * PID: 27476 Sessions: 1 Processed: 37 Uptime: 25m 0s CPU: 0% Memory : 117M Last used: 0s ago
Our cron job to log this information over time is something like this:
/path/to/.gem/gems/passenger-5.1.12/bin/passenger-status >> ./logs/passenger_status.log
Last week we had some issues in which our production server was experiencing short outages. Upon review we noticed that we were having a unusual amount of traffic coming to our server (most of it from crawlers submitting bad requests.) One of the tools that we used to validate the status of our server was the passenger_status.log file created via the aforementioned cron job.
The key piece of information that we use is the “Requests in queue” value highlighted above. We parsed this value of out the passenger_status.log file to see how it changed in the last 30 days. The result showed that although we have had a couple of outages recently the number of “requests in queue” dramatically increased about two weeks ago and it had stayed high ever since.
The graph below shows what we found. Notice how after August 19th the value of “requests in queue” has been constantly high, whereas before August 19th it was almost always zero or below 10.
We looked closely to our Apache and Rails logs and determined the traffic that was causing the problem. We took a few steps to handle it and now our servers are behaving as normal again. Notice how we are back to zero requests in queue on August 31st in the graph above.
The Ruby code that we use to parse our passenger_status.log file is pretty simple, it just grabs the line with the date and the line with the number of requests in queue, parses their values, and outputs the result to a tab delimited file that then we can use to create a graph in Excel or RAWGraphs. Below is the Ruby code:
require "date" log_file = "passenger_status.log" excel_date = true def date_from_line(line, excel_date) index = line.index(":") return nil if index == nil date_as_text = line[index+2..-1].strip # Thu Aug 30 14:00:01 -0400 2018 datetime = DateTime.parse(date_as_text).to_s # 2018-08-30T14:00:01-04:00 if excel_date return datetime[0..9] + " " + datetime[11..15] # 2018-08-30 14:00 end datetime end def count_from_line(line) return line.gsub("Requests in queue:", "").to_i end puts "timestamp\trequest_in_queue" date = "N/A" File.readlines(log_file).each do |line| if line.start_with?("Date ") date = date_from_line(line, excel_date) elsif line.include?("Requests in queue:") request_count = count_from_line(line) puts "\"#{date}\"\t#{request_count}" end end
In this particular case the number of requests in queue was caused by bad/unwanted traffic. If the increase in traffic had been legitimate we would have taken a different route, like adding more processes to our Passenger instance to handle the traffic.