--- title: "Mayhem at DinoFunWorld" author: "Petra Isenberg" date: "October 5, 2015" output: html_document --- #Merging Data Files with R ##Loading Files First we will load a file that contains attractions, their ids, and coordinates in the park ```{r} coordinates <- read.csv("ParkCoordinates.csv") head(coordinates) ``` Next we will load our data from the data cleaning exercise ```{r} attractions <- read.csv("AttractionsOCR-txt.csv") head(attractions) ``` ##Merging both files Next we need to merge both files into one that adds the coordinates to the park attractions file. But first we should compare both files: ```{r} str(coordinates) str(attractions) ``` We notice that AttractionID is not a Factor in the attractions dataset. Lets fix this: ```{r} attractions$AttractionID <- as.factor(attractions$AttractionID) ``` We can also see that there are a few more attractions in the coordinates dataset. So we need to be careful when merging and specify all.x=TRUE (see man page for the merge command by calling ?merge in the R console): ```{r} library(xtable) #you can also use the option echo=FALSE to hide this code in the output #to hide the output use results="hide" ```{r sectionname, results="hide"} fulldata <- merge(coordinates,attractions, by.x="AttractionID",by.y="AttractionID",all.x=TRUE) fulldata <- fulldata[order(fulldata$AttractionID),] ``` ```{r, results="asis"} xt <- xtable(fulldata) print.xtable(xt,type="html") ``` ##Modifying the Data From the table above we can see that park entrances have no Park Area or Attraction Type. By looking at our park map we can make the following modifications: ![Park Map](http://aviz.fr/wiki/uploads/TeachingVA2015/parkmap.png) ```{r} fulldata[fulldata$AttractionID=="N",]$ParkArea = "Entry Corridor" fulldata[fulldata$AttractionID=="E",]$ParkArea = "Kiddie Land" fulldata[fulldata$AttractionID=="W",]$ParkArea = "Tundra Land" #first we need to generate a new level for the CategoryNames column categorynames <- c(levels(fulldata$CategoryNames),"Entry-Exit") levels(fulldata$CategoryNames) <- categorynames #only now can we do this without generating any errors fulldata[fulldata$AttractionID=="N",]$CategoryNames = "Entry-Exit" fulldata[fulldata$AttractionID=="E",]$CategoryNames = "Entry-Exit" fulldata[fulldata$AttractionID=="W",]$CategoryNames = "Entry-Exit" #now let's check if this worked ok tail(fulldata) ``` Also we really don't need the Attractions.y column if everything checks out so far. So lets get rid of it: ```{r} fulldata$Attraction.y <- NULL ``` ##Plotting the data To double-check what we've done: First a plot that shows coordinates by park area ```{r} plot(fulldata$x, fulldata$y, col=fulldata$ParkArea) ``` You can try to make another plot colored by categoryNames ##Write a new data file ```{r} write.csv(fulldata,file="Attraction-Coordinates.csv") ``` You may want to show the final data file again here