Taking a Look at Santa Monica’s Parking API
Santa Monica Spaces is a project I started back in July 2015, with the hope of creating useful data and models as well as improving my programming, machine learning, and data analysis skills. This post will focus on the data used in the project, some of the opportunities it creates and the challenges it presents.
Back in June, the City of Santa Monica released a RESTful API that spits out real-time data for both parking lots and meters throughout the city. When it was released, I went to meetings run by the city council to introduce the APIs and go over what the data looked like. Below is a breakdown of the /meters/ route- the main data source of this project. Note that I am not going to talk about each and every sub-route and feature, but rather those that are of interest to this project:
GET /meters/ – Parking Meter Information
The base route,
/meters/ sends out mostly static information about each parking meter in Santa Monica, represented as a
metered_space object. Inside of each
metered_space are the parking meter’s
active properties. Let’s examine the fields that may need further explanation:
meter_id: A unique, persistent string identifier for an individual parking meter
active: Indicates whether or not a meter is in service/functioning. This field is rarely updated, and so checking once or twice daily for any changes is sufficient to stay up-to-date with meter statuses
area: a short description of the parking meter’s location. Can be used as a loose way of grouping meters together
The other properties are self-explanatory, but you can read more about them here.
GET /meters/events – Real time Meter Events Data
Going deeper, the
/meters/events/ route returns a list of
sensor_event objects, which represent parking meter events- a car either entering or leaving a parking space. These events are what I’ll be using to construct Santa Monica Spaces’ predictive models, so let’s go over all of the
sensor_event properties in detail:
event_id: The unique numeric identifier for each event
meter_id: The id of the meter that sent this event. The ids found here connect to
meter_idproperties returned from the base
event_type: Denotes the type of this event- can be one of
"SS"stands for “session start” (i.e. a car just entered this space), and
"SE"stands for “session end” (i.e. a car just left this space)
session_id: The unique session number that contains this event. A session is defined as one start event (
"SS") and one end event(
"SE"). Therefore, exactly two events should share
event_time: The time this event occurred, with precision to the second. The string format is an ISO 8601 formatted UTC date/time, but with all non-alphanumeric characters removed. For example, “2007-04-05T14:30Z” becomes “20070405T1430Z”. This is non-standard for most date formatting libraries, so one must create a function to parse the string themselves
ordinal: the unique number identifying the order in which the server received this event. An event with a lower ordinal was received earlier than those with a higher ordinal. Additionally, the server emits event data sorted by
ordinal. Can be used as an argument in the
/meters/events/since/route to limit the events returned to only those after this event was received by the server
Without any additional arguments,
/meters/events/ returns all parking events emitted in Santa Monica from the past 5 minutes. You can use the sub-route
/meters/events/since/ in order to modify how many events are sent from the server. By using
/meters/events/since/:datetime, you can use pass in a UTC date string (formatted as described above), which will return all of the events that have occurred since that time. Additionally, you can call
/meters/events/since/:ordinal to return all events that have occurred after the event with the specified ordinal number (inclusive of that event).
For both of these sub-routes, the API will not serve any events that occurred three hours prior to the request. i.e. you can only get three hours of historic event data without storing it yourself.
Looking at the data available from the API, a couple of interesting things stand out that will be of use when designing code:
- Minimizing data transfer: You can reduce the amount of repeat data you receive by keeping track of the latest
ordinalyou’ve seen and using
- Implicit data structure – Sessions: By tying together the start event (
"SS") and end event (
"SE") that share a
session_id, you can create a representation of a “session”. A “session” represents a period of time during which a parking meter is occupied
- Geo Data: We’ll have to make some sort of heatmap or other visualization on a map of Santa Monica- this data is begging for it.
Additionally, there appear to be several challenges this data presents, and they will need to be overcome in order to make the best use of it:
- Noisy data: There is potentially going to be a lot of noise in the data in the form of events. Because an event is emitted any time a car drives over or leaves a parking space, it’s possible for a single car to trigger multiple events while attempting to park, perhaps even at the same parking meter.
- Unbalanced Data: Ideally, we would receive the same amount of data from each parking meter at any given time, but that is not the case. The events we receive are sporadic, and some parking meters have a lot more events than others. This imbalance will cause issues if we try to do time series predictions
- Not Best Data: The event data we have is useful, but what we really want is information about a parking meter’s availability at any given time. That is, “was this parking meter occupied or open at this time?”
How are we going to fix these problems? It turns out that the solution lies in the meter sessions. In the next post, I’ll walk us through a visualization of unbalanced data, and how we can use sessions to solve it (and how that will alleviate other problems as well).