Josiah

Monitoring Passenger’s Requests in Queue over time

As I mentioned in a previous post we use Phusion Passenger as the application server to host our Ruby applications. A while ago upon the recommendation of my coworker Ben Cail I created a cron job that calls passenger-status every 5 minutes to log the status of Passenger in our servers. Below is a sample of the passenger-status output:

Version : 5.1.12
Date : Mon Jul 30 10:42:54 -0400 2018
Instance: 8x6dq9uX (Apache/2.2.15 (Unix) DAV/2 Phusion_Passenger/5.1.12)

----------- General information -----------
Max pool size : 6
App groups : 1
Processes : 6
Requests in top-level queue : 0

----------- Application groups -----------
/path/to/our/app:
App root: /path/to/our/app
Requests in queue: 3
* PID: 43810 Sessions: 1 Processed: 20472 Uptime: 1d 7h 31m 25s
CPU: 0% Memory : 249M Last used: 1s ag
* PID: 2628 Sessions: 1 Processed: 1059 Uptime: 4h 34m 39s
CPU: 0% Memory : 138M Last used: 1s ago
* PID: 2838 Sessions: 1 Processed: 634 Uptime: 4h 30m 47s
CPU: 0% Memory : 134M Last used: 1s ago
* PID: 16836 Sessions: 1 Processed: 262 Uptime: 2h 14m 46s
CPU: 0% Memory : 160M Last used: 1s ago
* PID: 27431 Sessions: 1 Processed: 49 Uptime: 25m 27s
CPU: 0% Memory : 119M Last used: 0s ago
* PID: 27476 Sessions: 1 Processed: 37 Uptime: 25m 0s
CPU: 0% Memory : 117M Last used: 0s ago

Our cron job to log this information over time is something like this:

/path/to/.gem/gems/passenger-5.1.12/bin/passenger-status >> ./logs/passenger_status.log

Last week we had some issues in which our production server was experiencing short outages. Upon review we noticed that we were having a unusual amount of traffic coming to our server (most of it from crawlers submitting bad requests.) One of the tools that we used to validate the status of our server was the passenger_status.log file created via the aforementioned cron job.

The key piece of information that we use is the “Requests in queue” value highlighted above. We parsed this value of out the passenger_status.log file to see how it changed in the last 30 days. The result showed that although we have had a couple of outages recently the number of “requests in queue” dramatically increased about two weeks ago and it had stayed high ever since.

The graph below shows what we found. Notice how after August 19th the value of “requests in queue” has been constantly high, whereas before August 19th it was almost always zero or below 10.

We looked closely to our Apache and Rails logs and determined the traffic that was causing the problem. We took a few steps to handle it and now our servers are behaving as normal again. Notice how we are back to zero requests in queue on August 31st in the graph above.

The Ruby code that we use to parse our passenger_status.log file is pretty simple, it just grabs the line with the date and the line with the number of requests in queue, parses their values, and outputs the result to a tab delimited file that then we can use to create a graph in Excel or RAWGraphs. Below is the Ruby code:

require "date"

log_file = "passenger_status.log"
excel_date = true

def date_from_line(line, excel_date)
  index = line.index(":")
  return nil if index == nil
  date_as_text = line[index+2..-1].strip # Thu Aug 30 14:00:01 -0400 2018
  datetime = DateTime.parse(date_as_text).to_s # 2018-08-30T14:00:01-04:00
  if excel_date
    return datetime[0..9] + " " + datetime[11..15] # 2018-08-30 14:00
  end
  datetime
end

def count_from_line(line)
  return line.gsub("Requests in queue:", "").to_i
end

puts "timestamp\trequest_in_queue"
date = "N/A"
File.readlines(log_file).each do |line|
  if line.start_with?("Date ")
    date = date_from_line(line, excel_date)
  elsif line.include?("Requests in queue:")
    request_count = count_from_line(line)
    puts "\"#{date}\"\t#{request_count}"
  end
end

In this particular case the number of requests in queue was caused by bad/unwanted traffic. If the increase in traffic had been legitimate we would have taken a different route, like adding more processes to our Passenger instance to handle the traffic.

Configuring Ruby, Bundler, and Passenger

Recently we upgraded several of our applications to a newer version of Ruby which was relatively simple to do in our local development machines. However, we ran into complications once we started deploying the updated applications to our development and productions servers. The problems that we ran into highlighted issues in the way we had configured our applications and Passenger on the servers. This blog post elaborates on the final configurations that we arrived to (at least for now) and explains the rationale for the settings that worked for us.

Our setup

We deploy our applications using a “system account” (e.g. appuser) so that execution permissions and file ownership are not tied to the account of the developer doing the deployment.

We use Apache as our web server and Phusion Passenger as the application server to handle our Ruby applications.

And last but not least, we use Bundler to manage gems in our Ruby applications.

Application-level configuration

We perform all the steps to deploy a new version of our applications with the “system account” for the application (e.g. appuser.)

Since sometimes we have more than one version of Ruby in our servers we use chruby to switch between versions on the server when we are logged in as the appuser. However, we have learned that is better not to select a particular version of Ruby as part this user’s bash profile. Executing ruby -v as this user upon login will typical show the version that came with the operating system (e.g. “ruby 1.8.7”).

By leaving the system Ruby as the default we are forced to select the proper version of Ruby that we want on each application, this has the advantage that the configuration for each application is explicit on what version of Ruby it needs. This also makes applications less likely to break when we install a newer version of Ruby on the server. This is particularly useful in our development server where we have many Ruby applications running and each of them might be using a different version of Ruby.

If we want to do something for a particular application (say install gems or run a rake task) then we switch to the version of Ruby (via chruby) that we need for the application before executing the required commands.

We have also found useful to configure Bundler to install application gems inside the application folder rather than in a global folder. We do this via Bundler --path parameter. The only gem that we install globally (i.e. in GEM_HOME) is bundler.

A typical deployment script looks more or less like this.

$ ssh our-production-machine

Switch to our system account on the remote server (notice that it references the Ruby that came with the operating system):

$ su - appuser

$ ruby -v
# => ruby 1.8.7 (2013-06-27 patchlevel 374) [x86_64-linux]
 
$ which ruby
# => /usr/bin/ruby

Activate the version of Ruby that we want for this app (notice that it references the Ruby that we installed):

$ source /opt/local/chruby/share/chruby/chruby.sh
$ chruby ruby-2.3.6

$ ruby -v 
# => ruby 2.3.6p384 (2017-12-14 revision 61254) [x86_64-linux] 
 
$ which ruby
# => ~/rubies/ruby-2.3.6/bin/ruby
 
$ env | grep GEM
# => GEM_HOME=/opt/local/.gem/ruby/2.3.6
# => GEM_ROOT=/opt/local/rubies/ruby-2.3.6/lib/ruby/gems/2.3.0
# => GEM_PATH=/opt/local/.gem/ruby/2.3.6:/opt/local/rubies/ruby-2.3.6/lib/ruby/gems/2.3.0

Install bundler (this is only needed the first time, notice how it is installed in GEM_HOME):

$ gem install bundler
$ gem list bundler -d
# => Installed at: /opt/local/.gem/ruby/2.3.6

Install the rest of the app, its gems, and execute some rake tasks (notice that Bundler will indicate that gems are being installed locally to ./vendor/bundle):

$ cd /path/to/appOne
$ git pull

$ RAILS_ENV=production bundle install --path vendor/bundle
# => Bundled gems are installed into `./vendor/bundle`

$ RAILS_ENV=production bundle exec rake assets:precompile

Passenger configuration

Our default passenger configuration is rather bare-bones and indicates only a few settings. For example our /etc/httpd/conf.d/passenger.conf looks more or less like this:

LoadModule passenger_module /opt/local/.gem/gems/passenger-5.1.12/buildout/apache2/mod_passenger.so

<IfModule mod_passenger.c>
  PassengerRoot /opt/local/.gem/gems/passenger-5.1.12
  PassengerUser appuser
  PassengerStartTimeout 300
</IfModule>

Include /path/to/appOne/http/project_passenger.conf
Include /path/to/appTwo/http/project_passenger.conf

Notice that there are no specific Ruby settings indicated above. The Ruby specific settings are indicated on the individual project_passenger.conf files for each application.

If we look at the passenger config for one of the apps (say /path/to/appOne/http/project_passenger.conf) it would look more or less like this:

<Location /appOne>
  PassengerBaseURI /appOne
  PassengerRuby /opt/local/rubies/ruby-2.3.6/bin/ruby
  PassengerAppRoot /path/to/appOne/
  SetEnv GEM_PATH /opt/local/.gem/ruby/2.3.6/
</Location>

Notice that this configuration indicates both the path to the Ruby version that we want for this application (PassengerRuby) and also where to find (global) gems for this application (GEM_PATH).

The value for PassengerRuby matches the path that which ruby returned above (/opt/local/rubies/ruby-2.3.6/bin/ruby) and clearly indicates that we are using version 2.3.6 for this application.

The GEM_PATH settings is very important since this is what allows Passenger to find bundler when loading our application. Not setting this value results in the application not loading and Apache logging the following error:

Could not spawn process for application /path/to/AppOne: An error occurred while starting up the preloader.
Error ID: dd0dcbd4
Error details saved to: /tmp/passenger-error-3OKItz.html
Message from application: cannot load such file -- bundler/setup (LoadError)
/opt/local/rubies/ruby-2.3.6/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
/opt/local/rubies/ruby-2.3.6/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'

Notice that we set the GEM_PATH value to the path returned by gem list bundler -d above. This is a bit tricky since if you are looking closely we are setting GEM_PATH to the value that GEM_HOME reported above (/opt/local/.gem/ruby/2.3.6/). I suspect we could have set GEM_PATH to /opt/local/.gem/ruby/2.3.6:/opt/local/rubies/ruby-2.3.6/lib/ruby/gems/2.3.0 to match the GEM_PATH above but we didn’t try that.

UPDATE: The folks at Phusion recommend setting GEM_HOME as well (even if Passenger does not need it) because some gems might need it.

— Hector Correa & Joe Mancino

Using synonyms in Solr

A few days ago somebody reported that our catalog returns different results if a user searches for “music for the hundred years war” than if the user searches for “music for the 100 years war”.

To handle this issue I decided to use the synonyms feature in Solr. My thought was to tell Solr that “100” and “hundred” are synonyms and they should be treated as such. I had seen a synonyms.txt file in the Solr configuration folder and I thought it was just a matter of adding a few lines to this file and voilà synonyms will kick-in. It turns out using synonyms in Solr is a
bit more complicated than that, not too complicated, but not as straightforward as I had thought.

Configuring synonyms in Solr

To configure Solr to use synonyms you need to add a filter to the field type where you want synonyms to be used. For example, to enable synonyms for the text field in Solr I added a filter using the SynonymFilterFactory in our schema.xml

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.ICUFoldingFilterFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
   <filter class="solr.SnowballPorterFilterFactory" language="English" />
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.ICUFoldingFilterFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
   <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
   <filter class="solr.SnowballPorterFilterFactory" language="English" />
 </analyzer>
</fieldType>

You can add this filter for indexing, for querying, or both. In the example above I am only configuring the use of synonyms at query time.

Notice how the SynonymFilterFactory references a synonyms.txt file. This text file is where synonyms are defined. Notice also the expanded=true setting.

File synonyms.txt accepts the list of synonyms in two formats. The first format is just a list of words that are considered synonyms, for example:

 100,hundred

With this format every time Solr see “100” or “hundred” in a value it will automatically expand the value to include “100” and “hundred”. For example, if we were to search for “music for the hundred years war” it will actually search for “music for the 100 hundred years war”, notice how it now includes both variations (100 and hundred) in the text to search. The same will be true if we were to search for “music for the 100 years war”, Solr will search for both variations.

A second format we can use to configure synonyms is by using the => operator to consolidate various terms into a different term, for example:

 100 => hundred

With this format every time Solr sees “100” it will replace it with “hundred”. For example if we search for “music for the 100 years war” it will search for “music for the hundred years war”. Notice that in this case Solr will include “hundred” but drop “100”. The => in synonyms.txt is a shortcut to override the expand=true setting to replace the values on the left with the values on the right side.

Testing synonym matching in Solr

To see how synonyms are applied you can use the “Analysis” option available on the Solr dashboard page.

The following picture shows how this tool can be used to verify how Solr is handling synonyms at index time. Notice, in the highlighted rectangle, how “hundred” was indexed as both “hundred” and “100”.

We can also use this tool to see how values are handled at query time. The following picture shows how a query for “music for the 100 years war” is handled and matched to an original text “music for the hundred years war”. In this particular case synonyms are enabled in the Solr configuration only at query time which explains why the indexed value (on the left side) only has “hundred” but the value used at query time has been expanded to included both “100” and “hundred” which results in a match.

Index vs Query time

When configuring synonyms in Solr is important to consider the advantages and disadvantages of using them at index time, query time, or both.

Using synonyms at query time is easy because you don’t have to change your index to add or remove synonyms. You just add/remove lines from the synonyms.txt file, restart your Solr core, and the synonyms are applied in subsequent searches.

However, there are some benefits of using synonyms at index time particularly when you want to handle multi-term synonyms. This blog post by John Berryman and this page on the Apache documentation for Solr give a good explanation on why multi-term synonyms are tricky and why applying synonyms at index time might be a good idea. An obvious disadvantage of applying synonyms at index time is that you need to reindex your data for changes to the synonyms.txt to take effect.

Ivy Plus Discovery Day

On June 4-5, 2017 the Library will host the third annual Ivy Plus Discovery Day. “DiscoDay”, as we like to call it, is an opportunity for staff who work on discovery systems (like Blacklight Josiah) to share an update of their work in progress and discuss common issues.

On Sunday, June 4 we will have a hackathon on these two topics.

StackLife — integrating virtual browse in discovery systems
Linked Data Authorities — leveraging authorities to provide users with another robust method for exploring our data and finding materials of interest

On Monday, June 5 there will be a full day of sharing and unconference discussion sessions.

We expect about 40 staff from the 13 Ivy Plus Libraries. We’ve initially limited participation to three staff from each institution and we hope to have a good mix of developers, metadata specialists, user experience librarians and others whose work is closely tied to the institution’s discovery system.

For more information about Discovery Day see: https://library.brown.edu/create/discoveryday/

Josiah is a discovery tool that searches most of the journal, magazine, and newspaper resources available through Brown’s library databases, as well as our Classic Josiah Catalog (which includes most of the library’s books and other physical holdings), the Brown University Library Digital Collections, and the Brown Digital Repository. Other types of materials such as dissertations, theses, and library research guides are also included. Josiah also searches the full-text content of the Brown University Library’s ebook collections.