The twitter API gives you longitude and latitude information, but does not categorize tweets by country. I know how to fit coordinates into country polygons using an R package. I tried to find an easy way to do this in python, but was spending a lot of time doing it and so I decided to stick with R!
Using R World Map to attach country names to tweet coordinates:
library(sp) library(rworldmap) data(countriesCoarseLessIslands) hash <- read.csv('final_tweet.csv') names(hash) <- c('id', 'lang', 'text', 'long', 'lat') no_na = na.omit(hash[, c("id", "long", "lat")]) #omit missing values from this dataframe hash_coords <- SpatialPoints(no_na[2:3]) # make SpatialPoint dataframe from regular coordinates # assign coordinate/projection system to hash_coords (we need it to be the same as the polygons for the spatial join) proj4string(hash_coords) <- proj4string(countriesCoarseLessIslands) # overlay points and assign to polygon spatial_join <- over(hash_coords, countriesCoarseLessIslands) no_na$sov <- spatial_join$SOVEREIGNT # attaching country to long-lat-hashtag country_tweet <- merge(hash, no_na, by = "id") write.csv(country_tweet, file = "final_tweet_country_code.csv", sep = ",", col.names = TRUE, row.names = FALSE)
bonus: Cool commands to help with memory management!
ls(all.names = TRUE) #to see all user-defined environmental variables rm("var1", "var2") # to remove variables from the environment
xdel("var1") #delete a variable globals() #see global variables locals() #see local variables
top #shows you percentage of Memory being, shift+m will sort top by memory useage df -h #shows you what is using your disk space