Monday, April 25, 2016

Access Your Google Location History in R

For those of us with Google location history turned on on their android phones, the Google maps page at maps.google.com/locationhistory containing our location history is quite interesting. Interesting enough that I wanted to try to pull it into R for further analysis. You can use Google Takeout to download your entire location history, or just about any data associated with your Google account. I was hoping to come across something a little lighter.

So, first off, location history data is not available from any of Google's API services. That leaves us with few options other than to simulate a browser session in R using rvest to get the data that way. There's a Stackoverflow post about what structuring a URL to get the desired KML data that I used as my guide.

Ever since creating my first R package with the help of Hilary Parker's blog post on creating R packages, I've created a bunch of them. I mostly keep them to myself, but even just for personal use, a structured R package makes things easier for me to document my own code and reuse it. Any how, I created a small package to download Google location history called GoogleLocationHistory.

I'll give an example of how it works via a real life example. I flew out to California in 2014 to go to the UseR conference at UCLA. In a bid to visit all MLB ballparks, I flew out to San Diego first, to go to a Padres game. Then I drove up the coast to a rental condo to go to the conference. I made it to an Angels and Dodgers game on this trip too. All three teams were home during the conference! The problem was that I didn't have the electronic toll thingie in my car and by the time I got back to my room to pay the toll online, I had no idea what toll roads I had taken. This info is required to pay the tolls online. So, I fired up Google location history and was able to figure it out that way. Creepy, but at least it served a use for me. Let's look at the data from that day…

Note that the “login” function assumes Google's two factor authentication is turned on. This is the only thing I can test given that it's turned on for my account. The “login” function will ask for the six digit code that is texted to you by Google when you attempt to login.

library("devtools")  # this is needed to install my package from Github
install_github("corynissen/GoogleLocationHistory")
library("GoogleLocationHistory")

sess <- login(username="corynissen@gmail.com", password="mypassword")
df <- location_history(session=sess, date="2014-06-29")

Here's a look at what the data looks like. The “when” and “coord” are data returned from Google. The library attempts to parse this information and creates the “lon”, “lat”, and “time” columns.

tail(df)
##                              when                     coord       lon
## 372 2014-06-29T20:52:33.211-07:00 -118.4767475 34.0597392 0 -118.4767
## 373 2014-06-29T20:53:19.280-07:00 -118.4752259 34.0587244 0 -118.4752
## 374 2014-06-29T20:54:04.294-07:00 -118.4721347 34.0600617 0 -118.4721
## 375 2014-06-29T20:55:41.607-07:00 -118.4711439 34.0619441 0 -118.4711
## 376 2014-06-29T20:56:54.865-07:00  -118.471275 34.0635299 0 -118.4713
## 377 2014-06-29T20:56:54.865-07:00    -118.471017 34.06378 0 -118.4710
##          lat                time
## 372 34.05974 2014-06-29 20:52:33
## 373 34.05872 2014-06-29 20:53:19
## 374 34.06006 2014-06-29 20:54:04
## 375 34.06194 2014-06-29 20:55:41
## 376 34.06353 2014-06-29 20:56:54
## 377 34.06378 2014-06-29 20:56:54

Maybe it would be cool to map the data…

library("ggmap")
maptile <- get_map(location=c(lon=((max(df$lon) + min(df$lon)) / 2),
                            lat=((max(df$lat) + min(df$lat)) / 2)), zoom=4)
mymap <- ggmap(maptile) + geom_path(data=df, aes(x=lon, y=lat), size=1.1) +
  theme_void()
mymap

plot of chunk unnamed-chunk-4

I created this to check out my own data, but if it can be useful for anybody else, that's great too. Let me know if you are able to do anything cool with it.