Predictive Modeling and Better Historical Data for the Beach City
Santa Monica Spaces is a project I started back in July 2015, with the hope of creating useful data and models, as well as improving my programming, machine learning, and data analysis skills. Over the next series of posts, I’ll introduce you to the project, talk about the goals and challenges involved, and catch you up to where I am now.
What is this thing?
This project is all about parking meter availability. Santa Monica Spaces aims to provide useful analysis and services by transforming data from the City of Santa Monica Parking API. Once complete, those accessing Santa Monica Spaces should be able to create historical visualizations in a real-time interface, export a number of useful datasets in a variety of formats, and make use of a predictive modeling feature to estimate the percentage availability of parking meters in a given region.
Why are you doing this thing?
Currently, the data available from the API is not in a format particularly suited for analysis, visualizations, or modeling. Additionally, the data is imbalanced, which can cause a number of headaches when trying to do time-series predictive modeling. Finally, the availability of historical parking meter data is rather low, and by storing records myself, I hope to have a large enough dataset to perform robust analysis and modeling.
What are the goals of this thing?
- Create software that automatically transforms imbalanced event data into balanced meter-availability data
- Maintain a secure, redundant database of historical meter data
- Add subroutines to software to create different types of data, such as overall percentage availability
- Implement a front-end interface for visualizations and data exports
- Train a neural network (or other machine learning structure) to predict parking meter availability
What technologies are you using?
So far, I’ve used the following languages and software:
- Python/iPython Notebook, pandas, scikit-learn
Is there an open source repository?
Soon! Parts of the project will definitely be put on my GitHub, but I need to double check and separate out the non-safe parts of my code (probably not the best idea to give out secret keys). Portions of my datasets will be available to test out code and get a better sense of everything.