Geocoding is the process of mapping an address or a location name to a point on Earth’s surface using latitude and longitude coordinates. Geocoding allows researchers to depict addresses on a map and perform many different kinds of analysis using geospatial data. For example, you might be interested in calculating the distance between two addresses or counting the number of particular kinds of places within a fixed radius from an address.
This post describes how to use the Geocoding API to geocode addresses. We’ll also demonstrate how to use the Places API to search around a location. For a general introduction to using web APIs for research, read our recent backgrounder.
In order to use the Google APIs mentioned above, we’ll first need to create a Google Cloud platform account and generate an API key. We can then convert a street address to (latitude, longitude)
coordinates:
from hyper.contrib import HTTP20Adapter
import requests
def geocode(address, api_key):
geocode_url = 'https://maps.googleapis.com/maps/api/geocode/json?' \
'?address={}' \
'&key={}'.format(address, api_key)
session = requests.Session()
session.mount(geocode_url, HTTP20Adapter())
response = session.get(geocode_url).json()
return (
response['results'][0]['geometry']['location']['lat'],
response['results'][0]['geometry']['location']['lng']
)
Once we have the coordinates, we can start to explore the area around this address. If we want to assess how urban it is, for example, we might want to find out how many movie theaters, restaurants or banks are located within a certain radius of the address. Other studies might be interested in knowing the average distance to the nearest hospital or gas station.
The Places API offers several different facilities, or “endpoints,” for retrieving places and information about them. Because we’re interested in the types of places surrounding a certain location, we’ll use the Nearby Search endpoint. Each endpoint can take a set of parameters that customize the search in some way; for example, there are different ways to order search results. Normally, results are ranked by their prominence relative to other places in the area. Prominence can be affected by a place’s ranking in Google’s index, global popularity and other factors. If the parameter rankby
is set to distance
, places will be returned in order of distance from the location. The search radius can also be limited with the radius
parameter, in which case results are ordered by prominence within the search area. The radius
and rankby
parameters are mutually exclusive.
With just a search area defined, we will get back an unfiltered list of places of any kind. To further filter the locations returned by a nearby search, such as to just grocery stores or gas stations, we can use the keyword
parameter. Google uses these keywords to match against all content that it has indexed for the place we’re studying, so it tends to return a broad set of results.
Another way to refine our search is to filter by one of Google’s location types. For example, we can set type=bakery
to request all bakeries within the radius distance. Of course, some bakeries might not be identified as such in Google’s database and therefore would be excluded from results. For a broad keyword search such as “bakery,” we elected not to specify a type at all, instead collecting all available data and then further refining the measure by spot-checking the results.
It’s important to remember that Google returns a maximum of 60 results per search, which are returned in three “pages” of 20. The pagetoken
parameter is included in the API response if there are additional pages of results to retrieve. Our function follows these tokens and joins the pages together into one list of 60. That is important to remember if you care about getting every result from your search within a specific territory, rather than just the closest or most relevant one.
You can achieve complete coverage within a territory in two ways. If you set rankby=distance
, then you can can presume that there are no omitted results that are closer than the 60th result. Otherwise, you can constrain your search radius until the number of returned institutions is under 60 and make additional queries over the area until you have complete coverage. With both strategies, it is important to reposition the location of follow-up queries as you retrieve results. The code below, implemented in Python, returns the 60 most prominent places that are found within the specified radius of the coordinates:
import requests
class GoogleSearchError(Exception):
pass
def search_places(
location = None,
keywords = None,
place_type = None,
page_token = None,
api_key = None,
radius = 50000,
tries = 10
):
place_url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json' \
'?key={}'.format(api_key)
# read more about API fields here:
# https://developers.google.com/places/web-service/search#PlaceSearchRequests
for api_field, api_field_value in (
# location is a coordinate tuple like (latitude, longitude)
('location', location if location is None else ','.join(str(l) for l in location)),
('keyword', keywords),
('radius', radius),
('type', place_type),
('pagetoken', page_token)
):
if api_field_value is not None:
place_url += '&{}={}'.format(api_field, api_field_value)
"""
Requests using a page token may fail with status "INVALID_REQUEST"
if dispatched too quickly; tokens take a few seconds to become valid.
Requests may also semi-randomly fail with status "UNKNOWN_ERROR",
but often these work on the next try.
This function will retry after a short sleep until we either
hit the tries limit, succeed (status "OK"), or get a bad status like "DENIED".
"""
while tries > 0:
response = requests.get(place_url).json()
if response['status'] == 'UNKNOWN_ERROR' \
or response['status'] == 'INVALID_REQUEST' \
and page_token is not None:
tries -= 1
# We build in some sleep time when using a page token,
# but sometimes it may take another try.
print('[INFO] Request failed, will retry {} times.'.format(tries))
time.sleep(2)
continue
elif response['status'] == 'ZERO_RESULTS':
# sometimes pages are empty...
return []
elif 'DENIED' in response['status']:
raise GoogleSearchError('[ERROR] API request DENIED; response: {}'.format(
response
))
elif response['status'] != 'OK':
raise GoogleSearchError('[ERROR] API request failed; response: {}'.format(
response
))
else:
break
if tries == 0 and not any(
value in response for value in ('next_page_token', 'results')
):
raise GoogleSearchError('[ERROR] API request failed and ran out of retries.')
results = response['results']
if 'next_page_token' in response:
# There is a short delay between when a next_page_token
# is issued and when it will become valid.
time.sleep(2)
results += search_places(page_token = response['next_page_token'])
return results
Google Maps is not the only geospatial data provider. Another popular option is the Open Street Map, which crowd-sources location information. By using any of these services, researchers can better understand the geographic context of specific locations and examine whether that geography has broader implications.