{"id":1005538,"date":"2023-07-14T13:26:53","date_gmt":"2023-07-14T17:26:53","guid":{"rendered":"https:\/\/dlibwwwcit.services.brown.edu\/create\/cds_dev2\/?page_id=1005538"},"modified":"2025-02-11T19:16:36","modified_gmt":"2025-02-12T00:16:36","slug":"open-refine","status":"publish","type":"page","link":"https:\/\/library.brown.edu\/create\/cds\/open-refine\/","title":{"rendered":"Open Refine"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><strong>OpenRefine:&nbsp;A Power Tool for Working with Messy Data<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Links for today\u2019s (February 12, 2025) class<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"http:\/\/openrefine.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Download &amp; install OpenRefine<\/a><\/li>\n\n\n\n<li>The URL for today\u2019s dataset is:<br><a href=\"https:\/\/media.githubusercontent.com\/media\/MuseumofModernArt\/collection\/refs\/heads\/main\/Artists.csv\">https:\/\/media.githubusercontent.com\/media\/MuseumofModernArt\/collection\/refs\/heads\/main\/Artists.csv<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.wikidata.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Wikidata home<\/a>\n<ul class=\"wp-block-list\">\n<li>an example of Wikidata as&nbsp;<a href=\"https:\/\/www.wikidata.org\/w\/rest.php\/wikibase\/v0\/entities\/items\/Q1063584\" target=\"_blank\" rel=\"noreferrer noopener\">computer gobbledygook<\/a><\/li>\n\n\n\n<li>the&nbsp;<a href=\"https:\/\/www.wikidata.org\/wiki\/Q1063584\" target=\"_blank\" rel=\"noreferrer noopener\">same example formatted for humans<\/a>&nbsp;(sort of)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>The URL for the Wikidata reconciliation service is:<br><a href=\"https:\/\/wikidata.reconci.link\/en\/api\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/wikidata.reconci.link\/en\/api<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>About<\/strong><\/h3>\n\n\n\n<p>Home page for OpenRefine:&nbsp;<a href=\"http:\/\/openrefine.org\/\">http:\/\/openrefine.org<\/a><\/p>\n\n\n\n<p>OpenRefine is a powerful tool for cleaning many kinds of data \u2013 numeric and textual, and can import from and also export to&nbsp;several useful&nbsp;formats. In this workshop we will focus on textual and numeric data that originates in tabular form, and will go through some basic views and transformations. After this introduction to OpenRefine, you can learn more from the many tutorials and examples on the web. \u201cCleaning data\u201d sounds like some sort of janitorial activity \u2014 a more productive way to describe what we are doing is to&nbsp;say that we &nbsp;are&nbsp;<em>modeling<\/em>&nbsp;and&nbsp;<em>remodeling<\/em>&nbsp;information.&nbsp;Changes in content and structure are important<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>they reflect&nbsp;a research &nbsp;method<\/li>\n\n\n\n<li>they require knowledge of the data&nbsp;and research context<\/li>\n\n\n\n<li>need to be documented so they can be reproduced.<\/li>\n<\/ul>\n\n\n\n<p>Finally, this will just scratch the surface of what OpenRefine can do. We are happy to meet with you and work out further solutions if you decide that this will be a useful tool. Interesting reading: Katie Rawson and Trevor Mu\u00f1oz, \u201cAgainst Cleaning\u201d&nbsp;<a href=\"http:\/\/curatingmenus.org\/articles\/against-cleaning\/\">http:\/\/trevormunoz.com\/notebook\/2016\/07\/07\/against-cleaning-curating-menus.html&nbsp;<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Install:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenRefine is a desktop application that runs in a local web server. You treat it like any other desktop application, but note that it uses a web browser as a user interface.<\/li>\n\n\n\n<li>Details for installing, starting and quitting the application are here: &nbsp;<a href=\"http:\/\/openrefine.org\/download.html\">http:\/\/openrefine.org\/download.html<\/a><\/li>\n\n\n\n<li>OpenRefine won\u2019t work on older versions of many browsers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Documentation<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenRefine home page:&nbsp;<a href=\"http:\/\/openrefine.org\/\">http:\/\/openrefine.org<\/a><\/li>\n\n\n\n<li>OpenRefine Documentation wiki:&nbsp;<a href=\"https:\/\/github.com\/OpenRefine\/OpenRefine\/wiki\">https:\/\/github.com\/OpenRefine\/OpenRefine\/wiki<\/a>&nbsp;This is where the most definitive version of the&nbsp;documentation can be found. Good starting points:\n\n\n\n\n\n\n\n\n<ul class=\"wp-block-list\">\n<li>3 videos that provide a great introduction<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenRefine 101 course \u2013 haven\u2019t tried it yet.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Examples and Techniques:<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Example of use:&nbsp;<a href=\"http:\/\/dataist.wordpress.com\/2012\/04\/10\/tutorial-using-google-refine-to-clean-mortgage-data\/\">http:\/\/dataist.wordpress.com\/2012\/04\/10\/tutorial-using-google-refine-to-clean-mortgage-data\/<\/a><\/li>\n\n\n\n<li>in depth tutorial:&nbsp;<a href=\"https:\/\/www.propublica.org\/nerds\/using-google-refine-for-data-cleaning\">http:\/\/www.propublica.org\/nerds\/item\/using-google-refine-for-data-cleaning<\/a><\/li>\n\n\n\n<li><a href=\"http:\/\/programminghistorian.org\/lessons\/cleaning-data-with-openrefine\">http:\/\/programminghistorian.org\/lessons\/cleaning-data-with-openrefine<\/a>&nbsp;Basically the&nbsp;first few chapters of the OpenRefine book.<\/li>\n\n\n\n<li>Miriam Posner\u2019s OpenRefine tutorial:&nbsp;<a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/tutorials-guides\/data-manipulation\/get-started-with-openrefine\/\">http:\/\/miriamposner.com\/classes\/dh101f17\/tutorials-guides\/data-manipulation\/get-started-with-openrefine\/<\/a><\/li>\n\n\n\n<li>Grel reference:&nbsp;<a href=\"https:\/\/openrefine.org\/docs\/manual\/grelfunctions\">https:\/\/openrefine.org\/docs\/manual\/grelfunctions<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Tasks<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Import file and types of source file\n<ul class=\"wp-block-list\">\n<li>Create a new project<\/li>\n\n\n\n<li>Get source file from&nbsp;URL&nbsp;<a href=\"http:\/\/cds.library.brown.edu\/projects\/OpenRefine\/Menu.csv\">http:\/\/cds.library.brown.edu\/projects\/OpenRefine\/Menu.csv<\/a>&nbsp;Dataset is from the &nbsp;NY Public Library\u2019s&nbsp;<em><a href=\"http:\/\/menus.nypl.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">What\u2019s on the Menu<\/a>.<\/em>&nbsp;(<a href=\"http:\/\/menus.nypl.org\/data\" target=\"_blank\" rel=\"noreferrer noopener\">http:\/\/menus.nypl.org\/data<\/a>)<\/li>\n\n\n\n<li>Examine the import preview and settings for managing imported data<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Explore the dataset using Filtering and make some simple changes \u2013 OpenRefine works primarily on columns. &nbsp;Use dropdown arrow to available actions.\n<ol class=\"wp-block-list\">\n<li>In the Notes column, what can you find out about which languages common? (<em>Hint:<\/em>&nbsp; use some filters)<\/li>\n\n\n\n<li>Make the Notes column more readable by converting the cells to mixed case.&nbsp;(<em>Hint:<\/em>&nbsp; From column dropdown, see Edit Cells&gt;Common Transforms)<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Look at all the other cell options<\/li>\n\n\n\n<li>History and Undo (Tab on the left)<\/li>\n\n\n\n<li>Facet \/ Clustering\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<ol class=\"wp-block-list\">\n<li>Try applying a text facet to the Events column.\n<ol class=\"wp-block-list\">\n<li>Cluster the facet, explore the histograms on the rights.<\/li>\n\n\n\n<li>Try to normalize the values of the Events column by merging facets.<\/li>\n\n\n\n<li>(Note: if you want to know more about the clustering algorithms, see&nbsp;<a href=\"https:\/\/openrefine.org\/docs\/technical-reference\/clustering-in-depth\">https:\/\/openrefine.org\/docs\/technical-reference\/clustering-in-depth<\/a>)<\/li>\n\n\n\n<li>Backtrack, make new column and try again\u2026 Don\u2019t delete information!<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Apply a Timeline facet to the Date column\n<ol class=\"wp-block-list\">\n<li>Dates could be viewed as a timeline \u2013 try making a timeline facet from the date column.<\/li>\n\n\n\n<li>Why isn\u2019t this working? Convert the column to actual dates. and try the&nbsp;timeline facet again.<\/li>\n\n\n\n<li>Clean up outliners using the timeline sliders on the left.<\/li>\n\n\n\n<li>Try making a new column based this value that only shows the year so it\u2019s easier to read.<pre style=\"background-color:#e9e9e9;overflow: auto;max-width: 100%;line-height: 1.7;margin: 20px 0;padding: 20px\"><code>value.datePart(\"years\")<\/code><\/pre><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Scatterplots (just to show that you can)\n<ol class=\"wp-block-list\">\n<li>Make sure date is a number and select the scatterplot facet from the date column dropdown.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Column Menus&nbsp;(run through them)<\/li>\n\n\n\n<li>Rows and Records\n<ol class=\"wp-block-list\">\n<li>Notes column has lots of information in it. Try to normalize and sort\n<ol class=\"wp-block-list\">\n<li>Column dropdown: Edit Cells&gt;Split multi-valued cell on the \u201c;\u201d.<\/li>\n\n\n\n<li>Facet and cluster<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Look at Dimensions: Split multivalued cell as before\n<ol class=\"wp-block-list\">\n<li>Remove&nbsp;numeric values: View as rows, select numeric values via regex.&nbsp;Filter on<pre style=\"background-color:#e9e9e9;overflow: auto;max-width: 100%;line-height: 1.7;margin: 20px 0;padding: 20px\"><code>^[\\d\\. X]+$<\/code><\/pre><\/li>\n\n\n\n<li>Delete matching rows from \u201cAll\u201d menu<\/li>\n\n\n\n<li>View again as records.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>GREL\n<ul class=\"wp-block-list\">\n<li>Some expressions <pre style=\"background-color:#e9e9e9;overflow: auto;max-width: 100%;line-height: 1.7;margin: 20px 0;padding: 20px\"><code>Place: edit cells value.replace(\/;$\/,\"\")<br>Dimension: value.replace(\/^(?:[^\\d]*)(\\d+\\.\\d+)?\/,'$1')<br>Various spots: value:replace(\/;\/\/) <br>Note: value.match(\/.*?(German|French|Hungarian).*?\/)[0]<\/code><\/pre><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Reconciling values<\/li>\n\n\n\n<li>Export<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>OpenRefine:&nbsp;A Power Tool for Working with Messy Data Links for today\u2019s (February 12, 2025) class About Home page for OpenRefine:&nbsp;http:\/\/openrefine.org OpenRefine is a powerful tool for cleaning many kinds of data \u2013 numeric and textual, and can import from and also export to&nbsp;several useful&nbsp;formats. In this workshop we will focus on textual and numeric data <a href=\"https:\/\/library.brown.edu\/create\/cds\/open-refine\/\" class=\"more-link\">&#8230;<span class=\"screen-reader-text\">  Open Refine<\/span><\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"template-full-width-no-menu.php","meta":{"footnotes":""},"class_list":["post-1005538","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/library.brown.edu\/create\/cds\/wp-json\/wp\/v2\/pages\/1005538","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/library.brown.edu\/create\/cds\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/library.brown.edu\/create\/cds\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/library.brown.edu\/create\/cds\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/library.brown.edu\/create\/cds\/wp-json\/wp\/v2\/comments?post=1005538"}],"version-history":[{"count":11,"href":"https:\/\/library.brown.edu\/create\/cds\/wp-json\/wp\/v2\/pages\/1005538\/revisions"}],"predecessor-version":[{"id":1006515,"href":"https:\/\/library.brown.edu\/create\/cds\/wp-json\/wp\/v2\/pages\/1005538\/revisions\/1006515"}],"wp:attachment":[{"href":"https:\/\/library.brown.edu\/create\/cds\/wp-json\/wp\/v2\/media?parent=1005538"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}