Brown University Library Digital Technologies

Using synonyms in Solr

A few days ago somebody reported that our catalog returns different results if a user searches for “music for the hundred years war” than if the user searches for “music for the 100 years war”.

To handle this issue I decided to use the synonyms feature in Solr. My thought was to tell Solr that “100” and “hundred” are synonyms and they should be treated as such. I had seen a synonyms.txt file in the Solr configuration folder and I thought it was just a matter of adding a few lines to this file and voilà synonyms will kick-in. It turns out using synonyms in Solr is a
bit more complicated than that, not too complicated, but not as straightforward as I had thought.

Configuring synonyms in Solr

To configure Solr to use synonyms you need to add a filter to the field type where you want synonyms to be used. For example, to enable synonyms for the text field in Solr I added a filter using the SynonymFilterFactory in our schema.xml

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.ICUFoldingFilterFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
   <filter class="solr.SnowballPorterFilterFactory" language="English" />
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.ICUFoldingFilterFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
   <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
   <filter class="solr.SnowballPorterFilterFactory" language="English" />
 </analyzer>
</fieldType>

You can add this filter for indexing, for querying, or both. In the example above I am only configuring the use of synonyms at query time.

Notice how the SynonymFilterFactory references a synonyms.txt file. This text file is where synonyms are defined. Notice also the expanded=true setting.

File synonyms.txt accepts the list of synonyms in two formats. The first format is just a list of words that are considered synonyms, for example:

 100,hundred

With this format every time Solr see “100” or “hundred” in a value it will automatically expand the value to include “100” and “hundred”. For example, if we were to search for “music for the hundred years war” it will actually search for “music for the 100 hundred years war”, notice how it now includes both variations (100 and hundred) in the text to search. The same will be true if we were to search for “music for the 100 years war”, Solr will search for both variations.

A second format we can use to configure synonyms is by using the => operator to consolidate various terms into a different term, for example:

 100 => hundred

With this format every time Solr sees “100” it will replace it with “hundred”. For example if we search for “music for the 100 years war” it will search for “music for the hundred years war”. Notice that in this case Solr will include “hundred” but drop “100”. The => in synonyms.txt is a shortcut to override the expand=true setting to replace the values on the left with the values on the right side.

Testing synonym matching in Solr

To see how synonyms are applied you can use the “Analysis” option available on the Solr dashboard page.

The following picture shows how this tool can be used to verify how Solr is handling synonyms at index time. Notice, in the highlighted rectangle, how “hundred” was indexed as both “hundred” and “100”.

We can also use this tool to see how values are handled at query time. The following picture shows how a query for “music for the 100 years war” is handled and matched to an original text “music for the hundred years war”. In this particular case synonyms are enabled in the Solr configuration only at query time which explains why the indexed value (on the left side) only has “hundred” but the value used at query time has been expanded to included both “100” and “hundred” which results in a match.

Index vs Query time

When configuring synonyms in Solr is important to consider the advantages and disadvantages of using them at index time, query time, or both.

Using synonyms at query time is easy because you don’t have to change your index to add or remove synonyms. You just add/remove lines from the synonyms.txt file, restart your Solr core, and the synonyms are applied in subsequent searches.

However, there are some benefits of using synonyms at index time particularly when you want to handle multi-term synonyms. This blog post by John Berryman and this page on the Apache documentation for Solr give a good explanation on why multi-term synonyms are tricky and why applying synonyms at index time might be a good idea. An obvious disadvantage of applying synonyms at index time is that you need to reindex your data for changes to the synonyms.txt to take effect.

Testing HTTP calls in Python

Many applications make calls to external services, or other services that are part of the application. Testing those HTTP calls can be challenging, but there are some different options available in Python.

Mocking

One option for testing your HTTP calls is to mock out your function that makes the HTTP call. This way, your function doesn’t make the HTTP call, since it’s replaced by a mock function that just returns whatever you want it to.

Here’s an example of mocking out your HTTP call:

import requests

class SomeClass:

  def __init__(self):
    self.data = self._fetch_data()

  def _fetch_data(self):
    r = requests.get('https://repository.library.brown.edu/api/collections/')
    return r.json()

  def get_collection_ids(self):
    return [c['id'] for c in self.data['collections']]

from unittest.mock import patch
MOCK_DATA = {'collections': [{'id': 1}, {'id': 2}]}

with patch.object(SomeClass, '_fetch_data', return_value=MOCK_DATA) as mock_method:
  thing = SomeClass()
  assert thing.get_collection_ids() == [1, 2]

Another mocking option is the responses package. Responses mocks out the requests library specifically, so if you’re using requests, you can tell the responses package what you want each requests call to return.

Here’s an example using the responses package (SomeClass is defined the same way as in the first example):

import responses
import json
MOCK_JSON_DATA = json.dumps({'collections': [{'id': 1}, {'id': 2}]})

@responses.activate
def test_some_class():
  responses.add(responses.GET,  'https://repository.library.brown.edu/api/collections/',
 body=MOCK_JSON_DATA,
 status=200,
 content_type='application/json'
 )
  thing = SomeClass()
  assert thing.get_collection_ids() == [1, 2]

test_some_class()

Record & Replay Data

A different type of solution is to use a package to record the responses from your HTTP calls, and then replay those responses automatically for you.

VCR.py – VCR.py is a Python version of the Ruby VCR library, and it supports various HTTP clients, including requests.

Here’s a VCR.py example, again using SomeClass from the first example:

import vcr 
IDS = [674, 278, 280, 282, 719, 300, 715, 659, 468, 720, 716, 687, 286, 288, 290, 296, 298, 671, 733, 672, 334, 328, 622, 318, 330, 332, 625, 740, 626, 336, 340, 338, 725, 724, 342, 549, 284, 457, 344, 346, 370, 350, 656, 352, 354, 356, 358, 406, 663, 710, 624, 362, 721, 700, 661, 364, 660, 718, 744, 702, 688, 366, 667]

with vcr.use_cassette('vcr_cassettes/cassette.yaml'):
  thing = SomeClass()
  fetched_ids = thing.get_collection_ids()
  assert sorted(fetched_ids) == sorted(IDS)

betamax – From the documentation: “Betamax is a VCR imitation for requests.” Note that it is more limited than VCR.py, since it only works for the requests package.

Here’s a betamax example (note: I modified the code in order to test it – maybe there’s a way to test the code with betamax without modifying it?):

import requests

class SomeClass:
    def __init__(self, session=None):
        self.data = self._fetch_data(session)

    def _fetch_data(self, session=None):
         if session:
             r = session.get('https://repository.library.brown.edu/api/collections/')
         else:
             r = requests.get('https://repository.library.brown.edu/api/collections/')
         return r.json()

    def get_collection_ids(self):
        return [c['id'] for c in self.data['collections']]


import betamax
CASSETTE_LIBRARY_DIR = 'betamax_cassettes'
IDS = [674, 278, 280, 282, 719, 300, 715, 659, 468, 720, 716, 687, 286, 288, 290, 296, 298, 671, 733, 672, 334, 328, 622, 318, 330, 332, 625, 740, 626, 336, 340, 338, 725, 724, 342, 549, 284, 457, 344, 346, 370, 350, 656, 352, 354, 356, 358, 406, 663, 710, 624, 362, 721, 700, 661, 364, 660, 718, 744, 702, 688, 366, 667]

session = requests.Session()
recorder = betamax.Betamax(
 session, cassette_library_dir=CASSETTE_LIBRARY_DIR
 )

with recorder.use_cassette('our-first-recorded-session', record='none'):
    thing = SomeClass(session)
    fetched_ids = thing.get_collection_ids()
    assert sorted(fetched_ids) == sorted(IDS)

Integration Test

Note that with all the solutions I listed above, it’s probably safest to cover the HTTP calls with an integration test that interacts with the real service, in addition to whatever you do in your unit tests.

Another possible solution is to test as much as possible with unit tests without testing the HTTP call, and then just rely on the integration test(s) to test the HTTP call. If you’ve constructed your application so that the HTTP call is only a small, isolated part of the code, this may be a reasonable option.

Here’s an example where the class fetches the data if needed, but the data can easily be put into the class for testing the rest of the functionality (without any mocking or external packages):

import requests

class SomeClass:

    def __init__(self):
        self._data = None

    @property
    def data(self):
        if not self._data:
            r = requests.get('https://repository.library.brown.edu/api/collections/')
            self._data = r.json()
        return self._data

    def get_collection_ids(self):
        return [c['id'] for c in self.data['collections']]


import json
MOCK_DATA = {'collections': [{'id': 1}, {'id': 2}]}

def test_some_class():
    thing = SomeClass()
    thing._data = MOCK_DATA
    assert thing.get_collection_ids() == [1, 2]

test_some_class()

Email Preservation

Email Preservation is one of many new initiatives I’ve taken on at Brown. As it often happens, this work’s been put into motion by necessity. Last year Brown took in the collection of a well-known 2nd Amendment activist who recently passed away. Part of the collection included his blog and his email.

Although email has been around for awhile, preservation strategies at libraries and archives are still emerging. Some archives simply print out emails and save them as they would any other paper record. This is a better strategy than no strategy at all, but it does not really honor the rich complexity of its original digital form.

Most email clients will export mailboxes as an MBOX file (including Google Takeout). MBOX files are stable for preservation purposes in that they are essentially (very) large blocks of text. They are, however, not good for processing or access purposes (see Figure 1).

(Figure 1: An encoded email attachment in an MBOX file)

A few years ago I worked on an email preservation project that used the National Archives of Australia’s Xena to parse MBOX files into XML. One MBOX file can contain multiple messages, and Xena parsed those messages into individual XML files that were readable in an Internet browser. This is a step in the right direction, but the text still isn’t very searchable or discoverable.

By the time this particular email collection came into Brown’s possession, Stanford’s ePADD project was new on the scene. EPADD uses Natural Language Processing to build a searchable index of topics, correspondents, and places to aid in processing and access. The name stands for its four essential functions: Processing, Appraisal, Discovery, and Delivery. The first two modules allow archivists to cull through email by correspondent, attachment, topic, etc. to redact or restrict messages as needed. Once the collection’s been processed, archivists can either deliver the collection to researchers on a local machine or make the collection discoverable online.

Our foray into email preservation is another great example of the essential partnership between my office in Digital Technologies and staff in Archives and Special Collections. I make heads and tails of the tool’s features, but I rely on my colleagues to guide an implementation that is in keeping with their principles and policies. In this case, I’ve been instructed that this collection’s scope does not include personal correspondence between family members. I can use the appraisal and processing modules to sort through emails and restrict based off this directive.

Upgrades and Architecture changes in the BDR

Recently we have been making some architectural changes in the BDR. One big change was migrating from RHEL 5 to RHEL 7, but we also moved from basically a one-server setup to four separate servers.

RHEL 5 => RHEL 7

RHEL 5 support ended in March, so we needed to upgrade. We initially got a RHEL 6 server, but then decided to upgrade to RHEL 7, which will give us longer before we have to upgrade again. Moving to RHEL 7 lets us use more up-to-date software like Redis 2.8.19, instead of 2.4.10, but the biggest issue is that security updates are no longer available for RHEL 5.

Added a Server for Loris

We started using Loris back in the fall. We installed Loris on a new server, and eventually we shut down our previous image server that was running on the same server as most of our other services.

Added Servers for Fedora & Solr

We also added a new server for Solr, and then a new server for Fedora. These two services previously ran on the one server that handled almost everything for the BDR, but now each one is on its own server.

Our fourth server is also RHEL 7 now – that’s where we moved our internet-facing services.

Pros & Cons

One advantage of being on four servers is the security we get from having our services isolated. Processes can be firewalled and blocked on the same server based on different users, firewall rules, … but having our backend servers firewalled off from the Internet and separated from each other encourages better security practices.

Also, the resources our services use are separated. If one service has an issue and starts using all the CPU or memory, it can’t take resources from the other services.

One downside of using four servers is that it increases the amount of work to setup and maintain things. There are four servers to setup and install updates on, instead of one. Also, the correct firewall rules have to be setup between the servers.

Django vs. Flask Hello-World performance

Flask and Django are two popular Python web frameworks. Recently, I did some basic comparisons of a “Hello-World” minimal application in each framework. I compared the source lines of code, disk usage, RAM usage in a running process, and response times and throughput.

Lines of Code

Both Django and Flask applications can be written in one file. The Flask homepage has an example Hello-World application, and it’s seven lines of code. The Lightweight Django authors have an example one-page application that’s 29 source lines of code. As I played with that example, I trimmed it down to 17 source lines of code, and it still worked.

Disk Usage

I measured disk usage of the two frameworks by setting up two different Python 3.6 virtual environments. In one, I ran “pip install flask”, and in the other I ran “pip install django.” Then I ran “du -sh” on the whole env/ directory. The size of the Django virtual environment was 54M, and the Flask virtual environment was 15M.

Here are the packages in the Django environment:

Django (1.11.1)
pip (9.0.1)
pytz (2017.2)
setuptools (28.8.0)

Here are the packages in the Flask environment:

click (6.7)
Flask (0.12.1)
itsdangerous (0.24)
Jinja2 (2.9.6)
MarkupSafe (1.0)
pip (9.0.1)
setuptools (28.8.0)
Werkzeug (0.12.1)

Memory Usage

I also measured the RAM usage of both applications. I deployed them with Phusion Passenger, and then the passenger-status command told me how much memory the application process was using. According to Passenger, the Django process was using 18-19M, and the Flask process was using 16M.

Loading-testing with JMeter

Finally, I did some JMeter load-testing for both applications. I hit both applications with about 1000 requests, and looked at the JMeter results. The response time average was identical: 5.76ms. The Django throughput was 648.54 responses/second, while the Flask throughput was 656.62.

Final remarks

This was basic testing, and I’m not an expert in this area. Here are some links related to performance:

Slides from a conference talk
Blog post comparing performance of Django on different application servers, on different versions of Python

Ivy Plus Discovery Day

On June 4-5, 2017 the Library will host the third annual Ivy Plus Discovery Day. “DiscoDay”, as we like to call it, is an opportunity for staff who work on discovery systems (like Blacklight Josiah) to share an update of their work in progress and discuss common issues.

On Sunday, June 4 we will have a hackathon on these two topics.

StackLife — integrating virtual browse in discovery systems
Linked Data Authorities — leveraging authorities to provide users with another robust method for exploring our data and finding materials of interest

On Monday, June 5 there will be a full day of sharing and unconference discussion sessions.

We expect about 40 staff from the 13 Ivy Plus Libraries. We’ve initially limited participation to three staff from each institution and we hope to have a good mix of developers, metadata specialists, user experience librarians and others whose work is closely tied to the institution’s discovery system.

For more information about Discovery Day see: https://library.brown.edu/create/discoveryday/

In Progress: The Mark Baumer Digital Collection

This is a guest post by Brown University Library’s Web Archiving Intern, Christina Cahoon. Christina is currently finishing her Masters of Library and Information Science degree at the University of Rhode Island.

After the recent passing of Brown University alumnus and Library staff member Mark Baumer MFA ‘11, the Brown University Library tasked itself with preserving his prolific web presence. I’m working towards that goal with Digital Preservation Librarian, Kevin Powell. Baumer was a poet and environmental activist who worked within the Digital Technologies Department as Web Content Specialist. This past October, Baumer began his Barefoot Across America campaign, with plans to walk barefoot from Rhode Island to California in an effort to raise money for environmental preservation and to support the FANG Collective. Unfortunately, this journey was tragically cut short on January 21, 2017, when Baumer was struck by a vehicle and killed while walking along a highway in Florida.

Baumer was an avid social media user who posted on several platforms multiple times a day. As such, the task of recording and archiving Baumer’s web presence is quite large and not free from complications. Currently, we are using Archive-It to crawl Baumer’s social media accounts and news sites containing coverage of Baumer’s campaign, including notices of his passing. While Archive-It does a fairly decent job recording news sites, it encounters various issues when attempting to capture social media content, including content embedded in news articles. As you can imagine, this is causing difficulties capturing the bulk of Baumer’s presence on the web.

Archive-It’s help center has multiple suggestions to aid in capturing social media sites that have proven useful when capturing Baumer’s Twitter feed; however, suggestions have either not been helpful or are non-existent when it comes to other social media sites like YouTube, Instagram, and Medium. The issues faced with crawling these websites range from capturing way too much information, as in the case with YouTube where our tests captured every referred video file from every video in the playlist, to capturing only the first few pages of dynamically loading content, as is the case with Instagram and Medium. We are re-configuring our approach to YouTube after viewing Archive-It’s recently-held Archiving Video webinar, but unfortunately the software does not have solutions for Instagram and Medium at this time.

These issues have caused us to re-evaluate our options for best methods to capture Baumer’s work. We have tested how WebRecorder works in capturing sites like Flickr and Instagram and we are still encountering problems where images and videos are not being captured. It seems as though there will not be one solution to our problem and we will have to use multiple services to sufficiently capture all of Baumer’s social media accounts.

The problems encountered in this instance are not rare in the field of digital preservation. Ultimately, we must continue testing different preservation methods in order to find what works best in this situation. It is likely we will need to use multiple services in order to capture everything necessary to build this collection. As for now, the task remains of discovering the best methods to properly capture Baumer’s work.

Solr LocalParams and dereferencing

A few months ago, at the Blacklight Summit, I learned that Blacklight defines certain settings in solrconfig.xml to serve as shortcuts for a group of fields with different boost values. For
example, in our Blacklight installation we have a setting for author_qf that references four specific author fields with different boost values.

<str name="author_qf">
  author_unstem_search^200
  author_addl_unstem_search^50
  author_t^20
  author_addl_t
</str>

In this case author_qf is a shortcut that we use when issuing searches by author. By referencing author_qf in our request to Solr we don’t have to list all four author fields (author_unstem_search, author_addl_unstem_search, author_t, and author_addl_t) and their boost values, Solr is smart enough to use those four fields when it notices author_qf in the query. You can see the exact definition of this field in our GitHub repository.

Although the Blacklight project talks about this feature in their documentation page and our Blacklight instance takes advantage of it via the Blacklight Advanced Search plugin I had never really quite understood how this works internally in Solr.

LocalParams

Turns out Blacklight takes advantage of a feature in Solr called LocalParams. This feature allows us to customize individual values for a parameter on each request:

LocalParams stands for local parameters: they provide a way to “localize” information about a specific argument that is being sent to Solr. In other words, LocalParams provide a way to add meta-data to certain argument types such as query strings. https://wiki.apache.org/solr/LocalParams

The syntax for LocalParams is p={! k=v } where p is the parameter to localize, k is the setting to customize, and v the value for the setting. For example, the following

q={! qf=author}jane

uses LocalParams to customize the q parameter of a search. In this case it forces the query field qf parameter to use the author field when it searches for “jane”.

Dereferencing

When using LocalParams you can also use dereferencing to tell the parser to use an already defined value as the value for a LocalParam. For example, the following example shows how to use the already defined value (author_qf) when setting the value for the qf in the LocalParams. Notice how the value is prefixed with a dollar-sign to indicate dereferencing:

q={! qf=$author_qf}jane

When Solr sees the $author_qf it replaces it with the four author fields that we defined for it and sets the qf parameter to use the four author fields.

You can see how Solr handles dereferencing if you pass debugQuery=true to your Solr query and inspect the debug.parsedquery in the response. The previous query would return something along the lines of

(+DisjunctionMaxQuery(
    (
    author_t:jane^20.0 |
    author_addl_t:jane |
    author_addl_unstem_search:jane^50.0 |
    author_unstem_search:jane^200.0
    )~0.01
  )
)/no_coord

Notice how Solr dereferenced (i.e. expanded) author_qf to the four author fields that we have configured in our solrconfig.xml with the corresponding boost values.

It’s worth noticing that dereferencing only works if you use the eDisMax parser in Solr.

There are several advantages to using this Solr feature that come to mind. One is that your queries are a bit shorter since we are passing an alias (author_qf) rather than all four fields and their boost values, this makes reading the query a bit clearer. The second advantage is that you can change the definition for the author_qf field on the server (say to add include a new author field in your Solr index) and the client applications automatically will use the definition when you reference author_qf.

Storing Embargo Data in Fedora

We have been storing dissertations in the BDR for a while. Students have the option to embargo their dissertations, and in that case we set the access rights so that the dissertation documents are only accessible to the Brown community (although the metadata is still accessible to everyone). The problem is that embargoes can be extended upon request, so we really needed to store the embargo extension information.

We wanted to use a common, widely-used vocabulary for describing the embargoes, instead of using our own terms. We investigated some options, including talking with Hydra developers on Slack, and emailing the PCDM community. Eventually, we opened a PCDM issue to address the question of embargoes in PCDM. As part of the discussion and work from that issue, we created a shared document that lists many vocabularies that describe rights, access rights, embargoes, … Eventually, the consensus in the PCDM community was to recommend the PSO and FaBiO ontologies (part of the SPAR Ontologies suite), and a wiki page was created with this information.

At Brown, we’re using the “Slightly more complex” option on that wiki page. It looks like this:

<pcdm:Object> pso:withStatus pso:embargoed .

<pcdm:Object> fabio:hasEmbargoDate “2018-11-27T00:00:01Z”^^xsd:dateTime .

In our repository, we’re not on Fedora 4 or PCDM, so we just put statements like these in the RELS-EXT datastream of our Fedora 3 instance. It looks like this:

<rdf:RDF xmlns:fabio=“http://purl.org/spar/fabio/#” xmlns:pso=“http://purl.org/spar/pso/#” xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”>
<rdf:Description rdf:about=“info:fedora/test:230789”>
<pso:withStatus rdf:resource=“http://purl.org/spar/pso/#embargoed”></pso:withStatus>
<fabio:hasEmbargoDate>2018-11-27T00:00:01Z</fabio:hasEmbargoDate>
<fabio:hasEmbargoDate>2020-11-27T00:00:01Z</fabio:hasEmbargoDate>
</rdf:Description>
</rdf:RDF>

In the future, we may want to track various statuses for an item (eg. dataset) over its lifetime. In that case, we may move toward more complex PSO metadata that describes various states that the item has been in.

Fedora 4 – testing

Fedora 4.7.1 is scheduled to be released on 1/5/2017, and testing is important to ensure good quality releases (release testing page for Fedora 4.7.1).

Sanity Builds

Some of the testing is for making sure the Fedora .war files can be built with various options on different platforms. To perform this testing, you need to have 3 required dependencies installed, and run a couple commands.

Dependencies

Java 8 is required for running Fedora. Git is required to clone the Fedora code repositories. Finally, Fedora uses Maven as its build/management tool. For each of these dependencies, you can grab it from your package manager, or download it (Java, Git, Maven).

Build Tests

Once your dependencies are installed, it’s time to build the .war files. First, clone the repository you want to test (eg. fcrepo-webapp-plus):

git clone https://github.com/fcrepo4-exts/fcrepo-webapp-plus

Next, in the directory you just created, run the following command to test building it:

mvn clean install

If the output shows a successful build, you can report that to the mailing list. If an error was generated, you can ask the developers about that (also on the mailing list). The generated .war files will be installed to your local Maven repository (as noted in the output of the “mvn clean install” command).

Manual Testing

Another part of the testing is to perform different functions on a deployed version of Fedora.

Deploy

One way to deploy Fedora is on Tomcat 7. After downloading Tomcat, uncompress it and run ./bin/startup.sh. You should see the Tomcat Welcome page at localhost:8080.

To deploy the Fedora application, shut down your tomcat instance (./bin/shutdown.sh) and copy the fcrepo-webapp-plus war file you built in the steps above to the tomcat webapps directory. Next, add the following line to a new setenv.sh file in the bin directory of your tomcat installation (update the fcrepo.home directory as necessary for your environment):

export JAVA_OPTS=”${JAVA_OPTS} -Dfcrepo.home=/fcrepo-data -Dfcrepo.modeshape.configuration=classpath:/config/file-simple/repository.json”

By default, the fcrepo-webapp-plus application is built with WebACLs enabled, so you’ll need a user with the “fedoraAdmin” role to be able to access Fedora. Edit your tomcat conf/tomcat-users.xml file to add the “fedoraAdmin” role and give that role to whatever user you’d like to log in as.

Now start tomcat again, and you should be able to navigate to http://localhost:8080/fcrepo-webapp-plus-4.7.1-SNAPSHOT/ and start testing Fedora functionality.