Assignment


For your assignment you will be working with the vispubdata dataset (you downloaded it previously for assignment 1 - if not, download it again). Go to the http://vispubdata.org website to learn about the different data columns.

For the assignment load the data file into OpenRefine:

use the following settings upon creating your project (or your work may not be correctly graded):

Your task

Create two new csv files:

File 1 should contain data in this form:

Paper.DOIAuthor Name

That means, if you have a paper that has three authors, such as: 10.0001.0001, Isenberg,P;Dragicevic,P.;Fekete,J.D the file should look like this:

10.0001.0001Isenberg,P.
10.0001.0001Dragicevic,P.
10.0001.0001Fekete,J.D.

File 2 should contain:

Author NameAffiliation

The file2 should not contain rows with empty data. On file 2 also perform at least 3 different types of cleaning operations as we practiced them in class (or any others you may want to apply). These cleaning operations should be performed on multiple cells at once (single cell cleaning does not count).

Submitting the Assignment


WHAT - You should submit a single ZIP file called "YOUR_LASTNAME-Assignment2.zip" via email. It should contain:

  1. Two CSV files named "YOUR_LAST_NAME-Assignment2-File-#.csv" containing the cleaned data.
  2. Two JSON files named "YOUR_LAST_NAME-Assignment2-File-#.json" containing the operations you used to clean the data (Undo/Redo -> Extract). CHECK (!) that your json code works by testing it on a newly loaded projec.
  3. A txt file called YOUR_LAST_NAME-explanation.txt explaining the cleaning operations you performed

WHERE - You should email the file to petra.isenberg@inria.fr with the subject VA-Assignment2.

WHEN - Remember that Assignment 2 is due before 23:00 on November 2nd.