“Approximate Street Address” Doesn’t Do It Justice
Whew! It’s been a while since the last update- don’t worry, I’ve been hard at work learning TensorFlow (and I’ve even contributed to its documentation a touch), and I’ll have a fairly large post later this week. In the meantime, I thought I’d share something I’ve discovered about one of the Santa Monica Parking API‘s fields that I had previously shrugged off as unhelpful.
I was looking through some of my parking data and decided to print some information for all parking meters, ordered by meter_id, when I noticed something interesting:
Multiple meters were given the same street_address field. On further inspection, I also noticed that address numbers in the list either ended in 0 or 1. I couldn’t think of a better thing to do than plot some of them on a map and see what I came up with.
First, I picked two group of meters that had similar addresses. In this example, “00 Pico Blvd” and “01 Pico Blvd”.
Here’s the “01 Pico Blvd” coordinates mapped:
And then with “00 Pico Blvd” added in:
The street_address is a label for their block! I tested this out with several groups of meters, and found the block-by-block grouping consistent. That makes me comfortable to say this:
street_address Groups Parking Meters Together by Block
There’s your TLDR. Two reasons I’m sharing this today:
As of writing, that information is not conveyed in the API
It’s going to save a huge amount of effort when people inevitably want to group these meters together block-by-block
Before realizing this, I was thinking of various way of trying to use a combination of meter_id and GPS coordinates to try to group these together without doing it manually, but this provides a very natural way to group them! Hooray for the data being even better than first thought!
What is it? From TensorFlow’s introduction, it is “an open source library for numerical computation using data flow graphs”!
What is “an open source library for numerical computation using data flow graphs”?
Sounds like a mouthful, but “data flow graphs” are just a more-encompassing term for the kind of modeling neural networks use. And the library is described that way for a reason- TensorFlow is designed to not only provide flexible, highly optimized neural networks, but to be able to perform any sort of computation that is organized with a similar graph-like structure.
More on data flow graphs
These graphs are composed of two primary components, nodes and edges.
Nodes are the squares, circles, or ellipses on charts such as the one to the right here. They represent any sort of mathematical operation or function. In a neural network, these are your activation functions (like a sigmoid function).
Edges are the connections between the nodes. As you can see, they are directional, in that data flows from the output of one node and into the input of the next node (or several nodes) through these edges. Edges represent the “tensors”, or multi-dimensional arrays, which contain the weights for each of the outputs from the previous node to the next.
Compare that with a typical neural network model, and you can see how a neural network is just a specialized version of a data flow graph. Back to TensorFlow!
So what exactly is there to get excited about with TensorFlow? There are a jillion machine learning libraries out there, so how does this stick out amongst the crowd (other than it’s created by Google)? Well, a fair amount, actually. Here are some of the things I’m most excited about:
Easier Transition from Research to Production: Something that was always troublesome in machine learning, especially Neural Networks, was trying to take the model crafted in research and then applying it to a real production setting. Much research is done using Python, R, or MatLab (with accompanying libraries), which allows for faster iterations through the design and testing phase. Before, that code would hardly be touched once the model moved to production, as it needed to be reimplemented with a faster language, such as C++ or Java. Because of the way TensorFlow is designed, we should be able to take what we have and bring it directly to production with minimal, if any, code changes.
Flexibility: This is both a great thing and something to keep in mind. TensorFlow is not a neural network library- it is a data flow graph library. This makes it capable of handling much more nuanced and hand-modeled graphs, but it will require more finagling. While it doesn’t appear too difficult to create a simple neural network now, I expect that there will be some higher-level libraries built on top of TensorFlow to make it extremely easy.
Automatic CPU/GPU Integration: This might be the most exciting one for me. GPUs, or graphics processing units, have enabled much faster learning (especially neural networks), and taking advantage of them is crucial in having the power to create robust models. The problem, however, is that most machine learning libraries out there don’t have GPU support, and those that do are either hard to use or are much less flexible. For example, scikit-learn, one of the most popular libraries for machine learning, while extremely useful for testing out ideas, has no plans for GPU support in the near future. TensorFlow promises to bring both flexibility and power by taking advantage of all of your computing resources.
I’ll be digging into this more over the next few weeks! Check out Google Research’s blog post if you’re interested in reading more about it.