Ben Cail – Page 3 – Brown University Library Digital Technologies

IIIF and the BDR

We have recently installed the Loris image server, and we’re in the process of switching completely over to IIIF and Loris (from Djatoka).

So Far

We have created a IIIF gateway that handles user authentication and authorization for non-public items in the BDR. In the first implementation phase, we made the gateway work as a frontend to Loris for the IIIF Image API. When that was ready, we started switching most of our content viewing applications over to use IIIF image urls.

The next phase was to make the gateway also generate IIIF presentation manifests for relevant items in the BDR. We took the information from our Item API, and used that to create a IIIF Manifest. This process required adding some caching to bring manifest generation for objects with many children down to an acceptable time.

Example of this conversion:

Item API ==> IIIF Manifest

Future Work

We need to monitor and tweak our IIIF gateway and Loris instance, to make sure the performance is satisfactory. We’d like to add some code to warm up the gateway cache when needed, since manifest generation for large books takes so long. There’s also one of our viewers that still uses djatoka, so we need to either replace it or make it use IIIF.

We haven’t worked on the IIIF search API yet, but that may come in the future.

Benefits of IIIF

Using the IIIF APIs provides various benefits to an institution, including the community, the image servers, and the image viewers.

The IIIF community is vibrant, with many participating institutions. The IIIF Consortium “was formed in June 2015 to provide steering and sustainability to the IIIF community.” The community has a Slack page, a couple Google groups, and various events to attend. IIIF even has a Community and Communications Officer, who was hired in August 2016.

There are multiple IIIF-compliant servers listed on the IIIF website. In the Brown Digital Repository we are using Loris, but there are other options that either support IIIF natively or have an adaptor that lets you use them as IIIF servers.

Finally, there are many options for IIIF viewers. We currently use OpenSeadragon in the BDR, and it’s IIIF-compliant, but we’ve also looked at Universal Viewer and Mirador. To try out Universal Viewer, all we have to do is pass in a link to one of our IIIF presentation manifests, and we can test it out. We can also easily paste in our manifest link to the Mirador demo, and see how it works. This convenient interoperability is made possible by the APIs defined by the IIIF community.

This allows us to share the unique items available at Brown with the world in new and engaging ways.

More Examples:

The numbered names

Lovecraft, Howard P. to Barlow, Robert H. from Providence, RI

Loris Image Server Deployment

We have been using the Djatoka image server for years in the BDR. However, we are currently in the process of switching to the Loris image server. Loris is a IIIF-compliant image server, and its development is led by Jon Stroop at Princeton. Loris is written in Python, and it uses the WSGI standard (like Django and other Python web frameworks).

Loris comes with a setup.py file for installing, but I’ve developed some scripts that can help with installing Loris on RHEL/CentOS 7. There are three main things that need to happen for Loris to work: setting up the Python environment, creating the configuration files and directories that Loris is looking for, and configuring Loris as a WSGI application that can be served by Phusion Passenger or other application servers. If you run my “install_loris.sh” script (as root), it will set up loris for you and you’ll be able to immediately test it out. Then you can go to update the configuration and/or install an application server for production.

Python Environment

Loris is written in Python, so it requires certain Python packages to be installed, including requests and Pillow. The packages can be installed to the system Python site-packages, but in my scripts I set them up to install to a Python virtual environment. There are also some system Linux packages that need to be installed before the Python packages – these include image packages, gcc for compiling a Pillow extension, and others.

Filesystem Configuration

My script installed Loris to /opt/local/loris. Loris needs a configuration file, and I put that in /opt/local/loris/etc/loris2.conf; cache, tmp, and log directories go in /opt/local/loris as well. My script installs Kakadu (for JP2 images – make sure you have the appropriate license) to /opt/local/loris/bin and /opt/local/loris/lib, and configures the shared library dynamic linking to be able to load the Kakadu library from /opt/local/loris/lib.

WSGI application

I set up the loris WSGI app in /opt/local/loris/loris, where I copied the loris code. My script allows for two ways of running Loris – using a simple test server, and running a full application server.

If you just want to test the installation quickly, there’s a launcher.py file in /opt/local/loris/loris that uses Werkzeug’s run_simple command. To kick this off, just activate the python environment and run “python loris/launcher.py”.

For production environments, you can use an application server like Passenger. All you have to do is point the app server to the /opt/local/loris/loris and the passenger_wsgi.py file. I have a script called install_passenger.sh, and it installs the standalone version of passenger. After running that script, cd to /opt/local/loris/loris and run “passenger start”.

Brown University Library – Loris Install Scripts

Hydra Connect 2016

Last week I attended Hydra Connect 2016 in Boston, with a team of three others from the Brown University Library. Our team consisted of a Repository Developer, Discovery Systems Developer, Metadata Specialist, and Repository Manager. Here are some notes and thoughts related to the conference from my perspective as a repository programmer.

IPFS

There was a poster about IPFS, which is a peer-to-peer hypermedia protocol for creating a distributed web. It’s an interesting idea, and I’d like to look into it more.

APIs and Architecture

There was a lot of discussion about the architecture of Hydra, and Tom Cramer mentioned APIs specifically in his keynote address. In the Brown Digital Repository, we use a set of APIs that clients can access and use from any programming language. This architecture lets us define layers in the repository: the innermost layer is Fedora and Solr, the next layer is our set of APIs, and the outer layer is the Studio UI, image/book viewers, and custom websites built on the BDR data. There is some overlap in our layers (eg. the Studio UI does hit Solr directly instead of going through the APIs), but I still think it improves the architecture to think about these layers and try not to cross multiple boundaries. Besides having clients that are written in python, ruby, and php, this API layer may be useful when we migrate to Fedora 4 – we can use our APIs to communicate with both Fedora 3 and Fedora 4, and any client that only hits the APIs wouldn’t need to be changed to be able to handle content in Fedora 4.

I would be interested in seeing a similar architecture in Hydra-land (note: this is an outsider’s perspective – I don’t currently work on CurationConcerns, Sufia, or other Hydra gems). A clear boundary between “business logic” or processing and the User Interface or data presentation seems like good architecture to me.

Data Modeling and PCDM

Monday was workshop day at Hydra Connect 2016, and I went to the Data Modeling workshop in the morning and the PCDM In-depth workshop in the afternoon. In the morning session, someone mentioned that we shouldn’t have data modeling differences without good reason (ie. does a book in one institution really have to be modeled differently from a book at another institution?). I think that’s a good point – if we can model our data the same way, that would help with interoperability. PCDM, as a standard for how our data objects are modeled, might be great way to promote interoperability between applications and institutions. In the BDR, we could start using PCDM vocabulary and modeling techniques, even while our data is in Fedora 3 and our code is written in Python. I also think it would be helpful to define and document what interoperability should look like between institutions, or different applications at the same institution.

Imitate IIIF?

It seems like the IIIF community has a good solution to image interoperability. The IIIF community has defined a set of APIs, and then it lists various clients and servers that implement those APIs. I wonder if the Hydra community would benefit from more of a focus on APIs and specifications, and then there could be various “Hydra-compliant” servers and clients. Of course, the Hydra community should continue to work on code as well, but a well-defined specification and API might improve the Hydra code and allow the development of other Hydra-compliant code (eg. code in other programming languages, different UIs using the same API, …).

New Theses & Dissertations site

Last week, we went live with a new site for Electronic Theses and Dissertations.

My part in the planning and coding of the site started back in January, and it was nice to see the site go into production (although we do have more work to do with the new site and shutting down the old one).

Old Site

The old site was written in PHP and only allowed PhD dissertations to be uploaded. It was a multi-step process to ingest the dissertations into the BDR: use a php script to grab the information from the database and turn it into MODS, split and massage the MODS data as needed, map the MODS data files to the corresponding PDF, and run a script to ingest the dissertation into the BDR. The process worked, but it could be improved.

New Site

The new site is written in Python and Django. It now allows for Masters theses as well as PhD dissertations to be uploaded. Ingesting the theses and dissertations into the BDR will be a simple process of selecting the theses/dissertations in the Django admin when they are ready to ingest, and running the ingest admin action – the site will know how to ingest the theses and dissertations into the BDR in the correct format.