Brown University Homepage Brown University Library

Exporting Django data

We recently had a couple cases where we wanted to dump the data out of a Django database. In the first case (“tracker”), we were shutting down a legacy application, but needed to preserve the data in a different form for users. In the second case (“deposits”), we were backing up some obsolete data before removing it from the database. We handled the processes in two different ways.

Tracker

For the tracker, we used an export script to extract the data. Here’s a modified version of the script:

def export_data():
    now = datetime.datetime.now()
    dir_name = 'data_%s_%s_%s' % (now.year, now.month, now.day)
    d = os.mkdir(dir_name)
    file_name = os.path.join(dir_name, 'tracker_items.dat')
    with open(file_name, 'wb') as f:
        f.write(u'\u241f'.join([
                    'project name',
                    'container identifier',
                    'container name',
                    'identifier',
                    'name',
                    'dimensions',
                    'note',
                    'create digital surrogate',
                    'qc digital surrogate',
                    'create metadata record',
                    'qc metadata record',
                    'create submission package']).encode('utf8'))
        f.write('\u241e'.encode('utf8'))
        for project in models.Project.objects.all():
            for container in project.container_set.all():
                print(container)
                for item in container.item_set.all():
                    data = u'\u241f'.join([
                        project.name.strip(),
                        container.identifier.strip(),
                        container.name.strip(),
                        item.identifier.strip(),
                        item.name.strip(),
                        item.dimensions.strip(),
                        item.note.strip()
                    ])
                    item_actions = u'\u241f'.join([str(item_action) for item_action in item.itemaction_set.all().order_by('id')])
                    line_data = u'%s\u241f%s\u241e' % (data, item_actions)
                    f.write(line_data.encode('utf8'))

As you can see, we looped through different Django models and pulled out fields, writing everything to a file. We used the Unicode Record and Unit Separators as delimiters. One advantage of using those is that your data can have commas, tabs, newlines, … and it won’t matter. You still don’t have to quote or escape anything.

Then we converted the data to a spreadsheet that users can view and search:

import openpyxl

workbook = openpyxl.Workbook()
worksheet = workbook.active

with open('tracker_items.dat', 'rb') as f:
    data = f.read()
    lines = data.decode('utf8').split('\u241e')
    print(len(lines))
    print(lines[0])
    print(lines[-1])
    for line in lines:
        fields = line.split('\u241f')
        worksheet.append(fields)
workbook.save('tracker_items.xlsx')

Deposits

For the deposits project, we just used the built-in Django dumpdata command:

python manage.py dumpdata -o data_20180727.dat

That output file could be used to load data back into a database if needed.