Programming – Page 2 – Brown University Library Digital Technologies

Solr LocalParams and dereferencing

A few months ago, at the Blacklight Summit, I learned that Blacklight defines certain settings in solrconfig.xml to serve as shortcuts for a group of fields with different boost values. For
example, in our Blacklight installation we have a setting for author_qf that references four specific author fields with different boost values.

<str name="author_qf">
  author_unstem_search^200
  author_addl_unstem_search^50
  author_t^20
  author_addl_t
</str>

In this case author_qf is a shortcut that we use when issuing searches by author. By referencing author_qf in our request to Solr we don’t have to list all four author fields (author_unstem_search, author_addl_unstem_search, author_t, and author_addl_t) and their boost values, Solr is smart enough to use those four fields when it notices author_qf in the query. You can see the exact definition of this field in our GitHub repository.

Although the Blacklight project talks about this feature in their documentation page and our Blacklight instance takes advantage of it via the Blacklight Advanced Search plugin I had never really quite understood how this works internally in Solr.

LocalParams

Turns out Blacklight takes advantage of a feature in Solr called LocalParams. This feature allows us to customize individual values for a parameter on each request:

LocalParams stands for local parameters: they provide a way to “localize” information about a specific argument that is being sent to Solr. In other words, LocalParams provide a way to add meta-data to certain argument types such as query strings. https://wiki.apache.org/solr/LocalParams

The syntax for LocalParams is p={! k=v } where p is the parameter to localize, k is the setting to customize, and v the value for the setting. For example, the following

q={! qf=author}jane

uses LocalParams to customize the q parameter of a search. In this case it forces the query field qf parameter to use the author field when it searches for “jane”.

Dereferencing

When using LocalParams you can also use dereferencing to tell the parser to use an already defined value as the value for a LocalParam. For example, the following example shows how to use the already defined value (author_qf) when setting the value for the qf in the LocalParams. Notice how the value is prefixed with a dollar-sign to indicate dereferencing:

q={! qf=$author_qf}jane

When Solr sees the $author_qf it replaces it with the four author fields that we defined for it and sets the qf parameter to use the four author fields.

You can see how Solr handles dereferencing if you pass debugQuery=true to your Solr query and inspect the debug.parsedquery in the response. The previous query would return something along the lines of

(+DisjunctionMaxQuery(
    (
    author_t:jane^20.0 |
    author_addl_t:jane |
    author_addl_unstem_search:jane^50.0 |
    author_unstem_search:jane^200.0
    )~0.01
  )
)/no_coord

Notice how Solr dereferenced (i.e. expanded) author_qf to the four author fields that we have configured in our solrconfig.xml with the corresponding boost values.

It’s worth noticing that dereferencing only works if you use the eDisMax parser in Solr.

There are several advantages to using this Solr feature that come to mind. One is that your queries are a bit shorter since we are passing an alias (author_qf) rather than all four fields and their boost values, this makes reading the query a bit clearer. The second advantage is that you can change the definition for the author_qf field on the server (say to add include a new author field in your Solr index) and the client applications automatically will use the definition when you reference author_qf.

Django project update

Recently, I worked on updating one of our Django projects. It hadn’t been touched for a while, and Django needed to be updated to a current version. I also added some automated tests, switched from mod_wsgi to Phusion Passenger, and moved the source code from subversion to git.

Django Update

The Django update didn’t end up being too involved. The project was running Django 1.6.x, and I updated it to the Django LTS 1.8.x. Django migrations were added in Django 1.7, and as part of the update I added an initial migration for the app. In my test script, I needed to add a django.setup() for the new Django version, but otherwise, there weren’t any code changes required.

Automated Tests

This project didn’t have any automated tests. I added a few tests that exercised the basic functionality of the project by hitting different URLs with the Django test client. These tests were not comprehensive, but they did run a signification portion of the code.

mod_wsgi => Phusion Passenger

We used to use mod_wsgi for serving our Python code, but now we use Phusion Passenger. Passenger lets us easily run Ruby and Python code on the same server, and different versions of Python if we want (eg. Python 2.7 and Python 3). (The mod_wsgi site has details of when it can and can’t run different versions of Python.)

Subversion => Git

Here at the Brown University Library, we used to store our source code in subversion. Now we put our code in Git, either on Bitbucket or Github, so one of my changes was to move this project’s code from subversion to git.

Hopefully these changes will make it easier to work with the code and maintain it in the future.

Python/Django Quicktips: Ordered JSON Load and Django Email Testing

Ordered JSON Load

Recently, I had the need to load some data from our JSON Item API in the same order it was created. When we construct the data, we use an OrderedDict to preserve the order and then we dump it to JSON.

In [1]: import json
In [2]: from collections import OrderedDict
In [3]: info = OrderedDict()
In [4]: info['zebra'] = 1
In [5]: info['aardvark'] = 10

In [6]: info
 Out[6]: OrderedDict([('zebra', 1), ('aardvark', 10)])

In [7]: json.dumps(info)
 Out[7]: '{"zebra": 1, "aardvark": 10}'

By default, though, the JSON module loads that data into a regular dict, and the order is lost.

In [8]: json.loads(json.dumps(info))
 Out[8]: {u'aardvark': 10, u'zebra': 1}

What’s the solution? Tell the json module to load the data into an OrderedDict:

In [9]: json.loads(json.dumps(info), object_pairs_hook=OrderedDict)
 Out[9]: OrderedDict([(u'zebra', 1), (u'aardvark', 10)])

Django email testing

Some of our django projects send out notification emails, to a user or a site admin. Django has the handy mail_admins and send_mail functions, but what if you want to test that the email was sent?

Django makes it easy to unit-test the emails – its test runner automatically uses a dummy email backend. Then you can import the mail outbox and verify its contents. Here’s a code snippet that tests an email being sent:

from django.core.mail import send_mail
def send_email():
    send_mail('Blog post', 'Test for the blog post',  digital_technologies@brown.edu',
 ['public@example.com'], fail_silently=False)

from django.test import SimpleTestCase
from django.core import mail

class TestEmail(SimpleTestCase):

   def test_email(self):
       send_email()
       self.assertEqual(len(mail.outbox), 1)
       self.assertEqual(mail.outbox[0].subject, 'Blog post')
       self.assertEqual(mail.outbox[0].body, 'Test for the blog post')

Note: you can’t import outbox from django.core.mail and check that len(outbox) == 1. This is because outbox is just a list, and it gets re-initialized to a new list before each test case.