Brown University Homepage Brown University Library

Checksums

In the BDR, we calculate checksums automatically on ingest (Fedora 3 provides that functionality for us), so all new content binaries going into the BDR get a checksum, which we can go back and check later as needed.

We can also pass checksums into the BDR API, and then we verify that Fedora calculates the same checksum for the ingested file, which shows that the content wasn’t modified since the first checksum was calculated. We have only been able to use MD5 checksums, but we want to be able to use more checksum types. This isn’t a problem for Fedora, which can calculate multiple checksum types, such as MD5, SHA1, SHA256, and SHA512.

However, there is a complicating factor – if Fedora gets a checksum mismatch, by default it returns a 500 response code with no message, so we can’t tell whether it was a checksum mismatch or some other server error. Thanks to Ben Armintor, though, we found that we can update our Fedora configuration so it returns the Checksum Mismatch information.

Another issue in this process is that we use eulfedora (which doesn’t seem to be maintained anymore). If a checksum mismatch happens, it raises a DigitalObjectSaveFailure, but we want to know that there was a checksum mismatch. We forked eulfedora and exposed the checksum mismatch information. Now we can remove some extra code that we had in our APIs, since more functionality is handled in Fedora/eulfedora, and we can use multiple checksum types.