It Depends on You: The Making of a Choose-Your-Own-Adventure Research Data Management LibGuide

Andrew Creamer, Scientific Data Management Specialist, Brown University Library

This week I attended my first Data Curation Group meeting. The group consists of the Library’s data and digital specialists, representing the humanities, social sciences, and sciences’ domains. Since I am new in my position as the Library’s data management specialist for the sciences, and new to Brown, I have been trying to learn the campus contacts, resources, and tools available for the University’s undergraduate, graduate, and faculty researchers, and the relevant policies that pertain to managing, archiving and sharing their data. I have been exploring ways to update our group’s online research guide using campus-specific items. I thought I would begin my search for these items using some guiding questions, reflecting the various aspects of data management planning. For example, one of the NSF recommendations for a data management plan is for researchers to describe their plans for storing, backing up, and securing their data. So in my first meeting, I asked my colleagues to consider one of my questions as if a student or faculty researcher were asking it: “What are my options for storing data at Brown?” To my surprise it invoked a lively debate.

“This question would not be asked!” responded one of my colleagues. “You’re asking the wrong question!”

“Ok, then think of this question more as a way for me to get to know what is available here.” I tried starting a list.

“There is no one University solution!” a colleague countered. “Your questions,” my colleagues argued, “should be: Where is my data now? Where should my data go?”

“Those are great additional questions; but if we could just answer the first question, locating the options that are available to…”

“But that should not be the first question!” exclaimed my colleagues.

“Ok, then please don’t consider this question as part of a sequence; think of it as a category,” I offered helplessly.

“Your question does not consider the data life cycle,” a colleague added.

“Alright, then what are some of the options available at Brown for any part of the research and data life cycles? For example, what storage options does Brown offer for researchers collecting and staging data during the project?” I pleaded.

In the midst of this Socratic exchange with my experienced and knowledgeable colleagues it finally dawned on me just how ineffectual and reductive it was for me to attempt to frame the structure of our research guide or my orientation to the University’s services by excluding the researchers’ unique, and sometimes messy, circumstances, thereby intimating there exists prescriptive, black and white answers. Of course, my colleagues were right; it is unlikely that anyone would ever ask my question as originally worded, disconnected from any context.

One of the reasons I joined Brown’s University Library, the CDS, and the Data Curation Group was the fact that I yearned to be challenged by, learn from, and collaborate with data and digital specialists representing the different domains of scholarship. My first meeting with the Data Curation Group was an example of the immense benefits one receives having such multidisciplinary perspectives. I walked away from this meeting considerably more inspired, with better ideas for customizing our research guide had we just made a laundry list of Brown’s storage options. This post should serve as reminder that in-person meetings should be reserved for such dialogue and negotiating multiple perspectives, the real return-on-investment of our team’s time and attention. Asking my colleagues to make a list would have been a waste of their time–something we could have done on a shared Google doc.

The most common answer to RDM questions is: it depends. How long should I retain this data? It depends. Can I share this data set? It depends. Where can I store this data? It depends. How should I de-identify this data set? It depends. Who owns my data? It depends. MIT and the University of Wisconsin-Madison have already done an outstanding job creating library research guides that explain the nuts and bolts of data management. We do not want to reinvent the wheel. On the contrary, the research data management (RDM) guides with the potential to be the most helpful for our users will be ones that function as Choose-Your-Own-Adventure guides, customized to the research ecology of our institutions. The answers to RDM questions depend on the specific intentions of researchers and the many variables and idiosyncrasies of their particular research projects.

Take storage, for example. If a researcher were to look for data storage solutions, examples of factors influencing his or her options include the size of the data set, its format, its perceived rate of growth, its restrictions, the need for access, the funds available, etc. A colleague gave us two examples of unique storage situations that had come up at Brown. One faculty needed a storage solution that would accommodate off-campus collaborators who would need to be able to access the data stored at Brown. Another faculty had a grant that required a storage option that would provide an off-site copy of the data.

Considering the lessons I learned above, I am now going to approach collaborating with my colleagues to update and frame our LibGuide using use-case style scenarios: I want to find my data; I want to know where I can stick human subjects’ data; I want to store small data sets; I want to store my data on campus but make it accessible to collaborators off-campus, etc.

So if you’re starting a research project, consider stopping by the second floor of the Rock and let us help you to choose your next data management adventure. What will you get out of it? It depends.

Guest Blog Post by Susanna Allés Torrent (Visiting Post-Doc)

Last November I came to the Center for Digital Scholarship of Brown University as a Post-Doc and stayed until March. My stay was sponsored by the University of Barcelona and the CASB (Consortium for Advanced Studies of Barcelona) Fellowship Program. The experience has been so stimulating and a truly rewarding one thanks to the CDS team.

I am a postdoctoral researcher at the Milà i Fontanals Institution – CSIC (the Spanish National Research Council). Our team is working, among other topics, on a lexicographical project called Glossarium Mediae Latinitatis Cataloniae, a Latin Dictionary of Latin terms from the corresponding IX to XII centuries Catalan linguistic domain. This dictionary is a work in progress, printed and published since 1960 (currently, just letters A-D, F-G are published). Our current goal is to transform the printed edition into a new digital one but we also intend to publish the remaining letters directly in a digital format. The idea of coming to Brown was born from a need to improve my knowledge about encoding, transforming and publishing TEI data.

In the meantime I became aware of the WWP seminars on TEI, and I though it was a great opportunity to take advantage of that training, so I applied for the event. I got in touch with E. Mylonas, whose support and help was invaluable, and I succeeded in coming to Brown, sponsorship of S. Bonde (History of Art and Architecture) – a medieval archaeologist and digital humanist.

At the end of November, I participated in the seminar “Taking TEI further: transforming and publishing TEI”, where the main topic was XSLT, a programming language used to transform XML data into other formats. The workshop was very useful and I learned a great deal, for example I was able to resolve several problems I had transforming my GMLC TEI data. But this training was just the beginning of many other activities, because at Brown, and at the CDS in particular, every single day is full of new inputs.

I really enjoyed the Digital Scholarship Lab’s vSalon, a series of weekly meetings organized by CDS, where everyone shares their work; the topics were always so inspiring: 3D capture methods and related software, geolocation technologies and device orientation, taxonomies in HTML 5, among many many others. I also gave an informal talk about my project, receiving great feedback and having the opportunity to share some of my issues.

For me, another interesting features of the CDS is its close collaboration with Brown professors and researchers and I took advantage of that as I was able to. Of special interest was a DH presentation done by Prof. J. Egan (English) and J. Bauer for a group of students. The seminars taught by Prof. M. Riva (Italian) and S. Lubar (Public Humanities) were also of great interest, especially the final project presentations, most of which focused on data visualization.

The CDS takes part in a lot of projects, among them the TAPAS project, an amazing project lead by J. Flanders (Northeastern University) to publish TEI data. I had the chance to attend one of their meetings and to test the toolkit from my newbie perspective. I also participated in the activities of the Virtual Humanities Lab, a center where Italian Studies and new technologies have created a great environment of international collaboration.

Moreover, being at the CDS has allowed me to understand how a Digital Humanities center works, its workflows in starting up a project (from the grant application, to the real work with professors, students and programmers) and in the day-to-day work. In that sense, their epigraphic projects deserve a special attention since their results are very impressive. Another CDS activity is the fact that, as part of library lecture series, they invite researchers, artists and professors to present their digital projects. I had the opportunity to hear Roderick Cover (Temple University), who gave an overview of of his techniques of narrative visualization. Also astonishing, from my point of view, is the offer of workshops organized by the Library and the CDS; I did my best to attend all the workshops I could! I was especially interested in bibliographical management, so I signed up for workshops on Mendeley and Zotero; I also attended the Gephi Workshop for data visualization (by J. Bauer) and two other about Geography Information System and Statistics (by B. Boucek).

Besides all these activities, I was so impressed by the stimulating Campus Life at Brown, and the activities organized by other institutions, as the John Carter Brown Library. The “Morning Mail” with its huge list of events and activities gave me every morning the feeling I was in the right place. I just hope to come back very soon and have the possibility once again to share knowledge with this great team!

Rome with a View – Undergraduate Research Projectd

CDS assisted Maddie High ’16 for her summer UTRA (Undergraduate Teaching and Research Award) project. Maddie worked with Prof. Lisa Mignone (Classics) to make a topically organized, digital collection of images from Roman archaeological sites, based on photographs taken by Prof. MIgnone.


Poster presented by Maddie High at the Summer UTRA Symposium, August 2013

Maddie began by working with Prof. Mignone to select images, and then digitized and processed any that were not born digital. She then imported them into Omeka, a web-publishing platform developed for the display of library, museum, archives, and scholarly collections and exhibitions. Once the images were in Omeka, she catalogued them, added keywords, and then organized them into exhibits, enriched by explanations and citations to ancient authors.

Photographs of ancient sites may preserve information that is lost over time, as ancient buildings and object become worn or damaged. Representations of ancient monuments made by archaeologists or travelers in the 19th century are already serving that purpose. This collection will preserve the views available in our time. It is also representative of the objects, locations and details that are of interest to a Roman historian. Prof. Mignone intends to use the images in these exhibits in her courses on Roman history.

Maddie is continuing her work on the Omeka site during this school year.

Read Maddie High’s poster (pdf): MaddieHigh-poster

Epigraphy at Brown and across the US – Undergraduate Research Project

CDS assisted Tori Lee ’14 with her summer UTRA (Undergraduate Teaching and Research Award) project.

Tori Lee and John Bodel standing with poster

Tori Lee and John Bodel at the UTRA Symposium, August 2013

Tori continued work that she had begun in the summer of 2012, updating and adding inscriptions to Prof. John Bodel’s US Epigraphy project.  This consisted of sorting through   correspondence with museums and other collections that have holdings of ancient Greek and Roman inscriptions, researching the inscriptions in scholarly journals and corpora, and making a digital record for each inscriptions. The digital inscriptions are structured using an XML schema called Epidoc, which is used by many digital inscription projects all over the world. Tori had to become familiar with the Epidoc schema, and with XML editing tools like the the Oxygen Editor editor which allowed her to enter the inscriptions and also to proofread them.

Tori’s work on the US Epigraphy collection also served as preparation for her own project; to document all the inscriptions on the Brown campus, and to create a digital collection of them. She has been photographing inscriptions all over Brown and recording GPS coordinates for them. She is working with information from the Brown archives to create digital metadata for the Brown inscriptions, and encode that in Epidoc as well so that in the future there will be a searchable website of Brown inscriptions, with photos and locations.

Download Tori's poster to read her full description (pdf): Lee-poster




“Opening the Archives” on the Brazilian military dictatorship

CDS is working Professor James Green, and in collaboration with the National Archives and Records Administration (NARA), the National Archive of Brazil, and the State University of Maringá (UEM) on a project, called “Opening the Archives,” to digitize, index, and make accessible the State Department’s declassified documents relating to U.S.-Brazilian relations from the turbulent 1960s, 70s and 80s. CDS is providing digitization workflow support, metadata development, and publication via the Brown Digital Repository. More details are available here. The first part of this collection is expected to be online later this year.

Funera Romana-Undergraduate Research Project


CDS sponsored Mary-Evelyn Farrior ’14 for her summer UTRA (Undergraduate Teaching and Research Award) project.


Funeral information encoded in XML

She worked with Prof. John Bodel in Classics on a digital, structured version of a project he had begun on paper, to collect and organize information about Roman funerals, as mentioned in Latin literature and historical writing of the Republic and Empire.

Mary-Evelyn used the Oxygen XML editing software, and, applying a TEI schema developed in conjunction with CDS, started to enter information about Roman Funerals. As she progressed, we worked together to refine the schema, and to create better sets of classifications for the characteristics she was capturing. We also provided a proofreading transform, so she could view a formatted version of her XML files.


Proofreading view

By the end of the summer, Mary-Evelyn had entered several hundred funerals. She add more funerals and more details over the course of this year. CDS is in the process of preparing the files for ingestion into the Brown Digital Repository, where they will become a digital corpus that will continue to grow.

Mary-Evelyn presented a poster [pdf] on her work at the Undergraduate Research Symposium at the end of the summer.

Two Presentations on XML Editing Software

Friday April 19
Digital Scholarship Lab, Rockefeller Library, Brown University (Providence, RI)

The Brown Library and the Center for Digital Scholarship are proud to host two presentations by the designers of the popular <oXygen/> XML Editor:

Getting the Most out of <oXygen/>
Customizing your <oXygen/> Working Environment

We are very lucky that Syncro Soft’s George Bina and Radu Coravu will be in Providence, and have agreed to present these two demonstrations. This will be a great opportunity for questions, answers, and discussion about using the <oXygen/> editor in your digital humanities work.

<oXygen/> is widely used in the DH XML community. It is a very powerful tool for editing, transforming, querying and generally interacting with XML documents. It has also been very responsive to the needs of DH practitioners: schemas and stylesheets for TEI, EAD, and other core digital humanities standards are bundled into the standard <oXygen/> package.

10:30-12 Getting the Most Out of <oXygen/>

This presentation is targeted at users who are getting started with <oXygen/>, or who have been using it, but have not started exploring its capabilities yet. It will cover Author Mode, using the different <oXygen/> panes, simple CSS, simple use of frameworks, online schemas, and more. We are leaving plenty of time for conversation, and hope that you will bring your questions and projects.

1:30-3:00 Customizing your <oXygen/> Working Environment

This presentation is intended to demonstrate more advanced uses of <oXygen/> such as custom frameworks, accessing data from external files, scripting Author mode using XSLT or XQuery, extensions and plugins.

Please come to either or both of the presentations. There will be time for lunch with George and Radu in between the two sessions, and plenty of time for discussion! So come with questions about what you’d like to learn about <oXygen/>.

Taking the Digital Scholarship Lab through its paces

Picture of students viewing each other's work on the DSL's 7 by 16 foot display wall

Enjoying a collaborative moment at the DSL during Tyler Denmead’s digital storytelling class

This semester, two classes are being regularly taught in the new Digital Scholarship Lab, along with a variety of ad hoc sessions.

The first regular class is Professor Massimo Riva‘s The Many Faces of Casanova (ITAL 1400J). Casanova’s long life as an icon of literature, art, film, and theater make him a good subject to to apply to the Lab’s 7′×16′ display wall. Prof. Riva uses those 112 square feet of high resolution to provide a broad overview of the many shapes he has taken over the centuries.

Professor Tyler Denmead of the John Nicholas Brown Center for Public Humanities is teaching a course on Digital Storytelling (AMST 2699), the potent combination of new media and narrative form. Students in the class generate material and share it back to their peers using the Lab’s flexible video switcher. The Lab is well suited to this kind of on-the-fly collaborative work; anywhere between 1 to 12 students’ computers can share the display wall in a variety of configurations.

We’re excited to see varied uses of the display wall, video switcher, and touch screens—so if you are interested in trying out the Lab for your class (for anything from a single session to a whole semester), please feel free to contact Data Visualization Coordinator Patrick Rashleigh at

Rest assured, the equipment is easy to use and support will be on hand to ensure a smooth setup.

The new CDS Video: Behind the Scenes

Back in March there was a call for short videos describing digital humanities centers. These videos would be displayed at the July annual members meeting for centerNet, an international network of Digital Humanities Centers. Since CDS is a member of centerNet, and I am the Executive Secretary for centerNet’s International Steering Committee, I asked if I could put together a video showcasing our work at Brown.

The video was created in Keynote, Apple’s presentation software — think PowerPoint, but actually attractive. I also used Omnigraffle, a great graphic design program, to make the abstract models of the Rockefeller and Science Libraries. The SciLi was easy to model, the Rock involved a lot of standing outside my office counting the windows . . .

Photos were contributed by the talented Lindsay Elgin and Ben Tyler from Digital Production Services, and of course, I used Bruce Boucek’s graphic as the “splash page” and background image.

Once the graphics and slides were complete I used Keynote’s built-in transitions to create movement and flow between the slides. Then I brought in editing support.

Arlando Battle, a recent Brown Graduate, helped me capture the Keynote presentation as a QuickTime video, timed to my reading of the voiceover text. I had been hoping to use Keynote’s automatic export to QuickTime, but you can’t vary the time between slides, so we had to improvise. Arlando then played back the quicktime video and recorded my voiceover separately. He took those files and cut them together in FinalCut Pro along with the 1971 field recording of Ghanian Postal Workers whistling a folk tune while canceling stamps in a syncopated rhythm.

By the time Arlando mixed in all the components I was on vacation and the resulting file was ~3GB — much too large for me to post online from my parents house in Brooklyn. After several failed attempts involving my personal YouTube channel and DropBox, followed by a flury of emails, Arlando loaded the video using Brown’s high speed connection and posted it to the Library’s YouTube Channel in time for the conference.

The final video can be seen here. Enjoy!

Dealing with Data Spring/Summer 2012 Cover Graphic

On March 1st of 2012, Bruce Boucek joined the Library as the new Social Sciences Data Librarian. In his first 3 months with the library Bruce has already contributed to a number of different projects. Beyond working with data and analysis he has a strong interest in the visualization of data and the use of such visualizations to tell larger stories. The Brown University Library has published its Spring/Summer 2012 Dealing with Data Newsletter; Bruce contributed the cover graphic shown here:

The graphic illustrates the relationships between data, the tools and methods we use to visualize data, and the desire to more fully understand our world, which is the reason we examine, explore, and analyze large data sets. The graphic consists of a Delauney Triangulation, examples of the raw data table used to generate it, and a subset of a NASA MODIS satellite image of New England.

The Delauney Triangulation is sometimes used for the analysis of spatially distributed data and in such cases the generated polygons are called Thiessen polygons. The data used to produce this one was synthetically generated using the add-on package or library deldir for the R environment for statistical computing and graphics. The data is composed of two sets of random normally distributed values, each with a count of 250, a mean of zero and a standard deviation of 5000 (completely arbitrary). The satellite imagery of southern New England provides contrast in that it is simultaneously another data set and a visibly recognizable piece of the human landscape.

This graphic also demonstrates how various data, methods, and tools can be used to visualize a larger story that is only partially told via numbers and models and that also requires human translation, interpretation, and dissemination.