Santa Monica Spaces: The Initial Data

Taking a Look at Santa Monica’s Parking API

Santa Monica Spaces is a project I started back in July 2015, with the hope of creating useful data and models as well as improving my programming, machine learning, and data analysis skills. This post will focus on the data used in the project, some of the opportunities it creates and the challenges it presents.


Back in June, the City of Santa Monica released a RESTful API that spits out real-time data for both parking lots and meters throughout the city. When it was released, I went to meetings run by the city council to introduce the APIs and go over what the data looked like. Below is a breakdown of the /meters/ route- the main data source of this project. Note that I am not going to talk about each and every sub-route and feature, but rather those that are of interest to this project:

GET /meters/ – Parking Meter Information

The base route, /meters/ sends out mostly static information about each parking meter in Santa Monica, represented as a metered_space object. Inside of each metered_space are the parking meter’s area, latitude, longitude, meter_id, street_address, and active properties. Let’s examine the fields that may need further explanation:

  • meter_id: A unique, persistent string identifier for an individual parking meter
  • active: Indicates whether or not a meter is in service/functioning. This field is rarely updated, and so checking once or twice daily for any changes is sufficient to stay up-to-date with meter statuses
  • area: a short description of the parking meter’s location. Can be used as a loose way of grouping meters together

The other properties are self-explanatory, but you can read more about them here.

GET /meters/events – Real time Meter Events Data

Going deeper, the /meters/events/ route returns a list of sensor_event objects, which represent parking meter events- a car either entering or leaving a parking space. These events are what I’ll be using to construct Santa Monica Spaces’ predictive models, so let’s go over all of the sensor_event properties in detail:

  • event_id: The unique numeric identifier for each event
  • meter_id: The id of the meter that sent this event. The ids found here connect to meter_id properties returned from the base /meters/ route
  • event_type: Denotes the type of this event- can be one of "SS"or "SE". "SS" stands for “session start” (i.e. a car just entered this space), and "SE" stands for “session end” (i.e. a car just left this space)
  • session_id: The unique session number that contains this event. A session is defined as one start event ("SS") and one end event("SE"). Therefore, exactly two events should share session_id
  • event_time: The time this event occurred, with precision to the second. The string format is an ISO 8601 formatted UTC date/time, but with all non-alphanumeric characters removed. For example, “2007-04-05T14:30Z” becomes “20070405T1430Z”. This is non-standard for most date formatting libraries, so one must create a function to parse the string themselves
  • ordinal: the unique number identifying the order in which the server received this event. An event with a lower ordinal was received earlier than those with a higher ordinal. Additionally, the server emits event data sorted by ordinal. Can be used as an argument in the /meters/events/since/ route to limit the events returned to only those after this event was received by the server

Without any additional arguments, /meters/events/ returns all parking events emitted in Santa Monica from the past 5 minutes. You can use the sub-route /meters/events/since/ in order to modify how many events are sent from the server. By using /meters/events/since/:datetime, you can use pass in a UTC date string (formatted as described above), which will return all of the events that have occurred since that time. Additionally, you can call /meters/events/since/:ordinal to return all events that have occurred after the event with the specified ordinal number (inclusive of that event).

For both of these sub-routes, the API will not serve any events that occurred three hours prior to the request. i.e. you can only get three hours of historic event data without storing it yourself.

Interesting Features

Looking at the data available from the API, a couple of interesting things stand out that will be of use when designing code:

  • Minimizing data transfer: You can reduce the amount of repeat data you receive by keeping track of the latest ordinal you’ve seen and using /meters/events/since/:ordinal
  • Implicit data structure – Sessions: By tying together the start event ("SS") and end event ("SE") that share a session_id, you can create a representation of a “session”. A “session” represents a period of time during which a parking meter is occupied
  • Geo Data: We’ll have to make some sort of heatmap or other visualization on a map of Santa Monica- this data is begging for it.

Additionally, there appear to be several challenges this data presents, and they will need to be overcome in order to make the best use of it:

  • Noisy data: There is potentially going to be a lot of noise in the data in the form of events. Because an event is emitted any time a car drives over or leaves a parking space, it’s possible for a single car to trigger multiple events while attempting to park, perhaps even at the same parking meter.
  • Unbalanced Data: Ideally, we would receive the same amount of data from each parking meter at any given time, but that is not the case. The events we receive are sporadic, and some parking meters have a lot more events than others. This imbalance will cause issues if we try to do time series predictions
  • Not Best Data: The event data we have is useful, but what we really want is information about a parking meter’s availability at any given time. That is, “was this parking meter occupied or open at this time?”

How are we going to fix these problems? It turns out that the solution lies in the meter sessions. In the next post, I’ll walk us through a visualization of unbalanced data, and how we can use sessions to solve it (and how that will alleviate other problems as well).

Up next: Balancing the data and visualizing sessions

Read More

Santa Monica Spaces: Introduction

Predictive Modeling and Better Historical Data for the Beach City

Santa Monica Spaces is a project I started back in July 2015, with the hope of creating useful data and models, as well as improving my programming, machine learning, and data analysis skills. Over the next series of posts, I’ll introduce you to the project, talk about the goals and challenges involved, and catch you up to where I am now.

What is this thing?

This project is all about parking meter availability. Santa Monica Spaces aims to provide useful analysis and services by transforming data from the City of Santa Monica Parking API. Once complete, those accessing Santa Monica Spaces should be able to create historical visualizations in a real-time interface, export a number of useful datasets in a variety of formats, and make use of a predictive modeling feature to estimate the percentage availability of parking meters in a given region.

Why are you doing this thing?

Currently, the data available from the API is not in a format particularly suited for analysis, visualizations, or modeling. Additionally, the data is imbalanced, which can cause a number of headaches when trying to do time-series predictive modeling. Finally, the availability of historical parking meter data is rather low, and by storing records myself, I hope to have a large enough dataset to perform robust analysis and modeling.

What are the goals of this thing?

  • Create software that automatically transforms imbalanced event data into balanced meter-availability data
  • Maintain a secure, redundant database of historical meter data
  • Add subroutines to software to create different types of data, such as overall percentage availability
  • Implement a front-end interface for visualizations and data exports
  • Train a neural network (or other machine learning structure) to predict parking meter availability

What technologies are you using?

So far, I’ve used the following languages and software:

  • Java
  • JavaScript(Node/Express)
  • Python/iPython Notebook, pandas, scikit-learn
  • awk
  • Octave

Is there an open source repository?

Soon! Parts of the project will definitely be put on my GitHub, but I need to double check and separate out the non-safe parts of my code (probably not the best idea to give out secret keys). Portions of my datasets will be available to test out code and get a better sense of everything.

Next time: looking at the starting data

Read More

My Computer is Using 19 GB of Virtual Memory


And all of its RAM

This happens a lot.

Some of it can be attributed to the various Chrome tabs I have open at any given time (Google Chrome Helpers out the wazoo). The rest of the blame falls on the work I do on-the-daily: data science and animation.

Who am I?

I’m Sam Abrahams– math nerd, programmer, and digital-media dude. I graduated from the University of Richmond in 2014 with a B.S. in Mathematical Economics (more statistics, less abstract algebra). After moving out to the west coast, I began seeking out analyst jobs all over Los Angeles. However, as I was interviewing for those jobs, I was given the opportunity to do a short 2D animation for Illumination Entertainment- best known for these yellow guys. They liked my work, and so I became a contracted artist with them. Since then, I’ve been doing animation for my job, while machine learning and programming have become the focus of my studies and personal projects.


It’s been a hell of a year for me, and I have a lot of exciting things now going on and planned for the future. I want this blog to be a repository for people to see what I’m working on and thinking about in the fields of machine learning, programming, and animation. Occasionally, I will try to write detailed tutorials for certain tasks I’ve become proficient at, and at the very least I will share thoughts on the latest hot things.

One post down. Let’s do this!

Read More