Motivation
I recently moved within SF since I had plans to move in with my girlfriend, and I wanted a change of scenery. Migrating from a 4 person giant apartment that was right off a main street in Dogpatch to a quiet, peaceful 1 bedroom in Pac Heights was a journey in and of itself. As a part of the move, I needed to figure out what to do with my car. While in Dogpatch, I’d become really comfortable with street parking. The neighborhood was super car-friendly and easy to park in, as long as you kept up with the street cleaning schedules in the area.
The second time I went to my new apartment, I spent around 20 minutes circling the neighborhood looking for parking. I realize now that I wasn’t the most familiar with the area, but still, that’s a long time in comparison to the < 1 minute it would take me 99% of the time in Dogpatch. Things felt even worse when I realized that I needed to get a new parking permit, which could only be started a couple weeks in the future. I put in the request and had to wait a couple of days before it arrived in the mail.
I only had a couple of days where I needed to street park without a permit (and mostly, it was just loading/unloading). I, however, really did not want to get a ticket, so I decided to do a bit of data digging to figure out which streets were more forgiving to park on.
The Ideas and Journey
The OG Plan
I had a general idea of how I wanted to do this. I found that the citation data is all public on DataSF, so I thought it would be as easy as taking the data and just putting it into some visualization. My general idea was the following:
- Grab the data via the export / direct link to the file
- Query it to find interesting trends
- Visualize portions to get a better high-level view (using the geometry column)
By the end, every step of this plan had to be adapted.
Grabbing Data
For the most part, grabbing the data was the same. This was a one-time analysis, so I decided it would be fine to just use one set of data. The issue was that the data itself was pretty large. My first solution here was simple — filter the data down to just the citations that occurred in the past month. This worked well enough, since I only really needed to see recent data anyways. Looking at citation data from 2 years ago wouldn’t be as telling as the data from just 2 months ago, in my opinion.
The data was still pretty large, so at this point, I decided to look into the various Violation Descriptions
that existed in the dataset. I settled on these 3, powered by info from descriptions of the various traffic citation reasons:
MTR OUT DT
— probably a citation when the meter is outPARKING PROHIB
— probably a citation when you’re parked where you’re not supposed to beRES/OT
— probably a citation when you’ve parked in a residential area for too long
I decided to skip out on the street cleaning tickets, since those, to me, were just noise. I already knew that citation enforcement would be plentiful at the street cleaning times, and I was interested in finding out when patrols might be running beyond just then.
Massaging Data
At first glance of the dataset, it looked like I’d have nicely formatted geodata just like the street cleaning schedule dataset.
Wrong! The geodata that I thought had geometry information actually didn’t. The geometry column exists in the citation dataset, but it looks like at some point in time, SFMTA decided to stop populating that column. Instead, they started using the Citation_Location
field to describe general location of citations.
Now, all of a sudden, I had a shiny new problem on my hands. I needed to figure out a free way to convert the street addresses, which could be cross-streets, actual addresses, etc., into mappable geodata. This is right when I started my foray into working with map data.
Turns out, this is not a new problem. It goes by the name of geocoding
, and services like google maps offer this at a certain price. However, there’s one free service out there called OpenStreetMap
(OSM), which allows for a very infrequent geocoding use. This would not fit my use case, since I had a big batch of data that I needed geocoded.
I dug deeper and found out that OSM uses a tool/server code called Nominatim
that can be self-hosted. Great! So now, if I could find a map to load into my local instance of Nominatim, I’d be completely fine. The only issue was that (at the time of writing), the planet.osm file was around 1977 GB. That’s a little too much for my M1 Macbook Air.
The thing is, I didn’t need the full planet. All I needed was San Francisco. That’s when I found this super cool site called bbbike. As a TLDR, it’s basically a super simple site where you can request small portioned OSM “excerpts.” These excerpts are full OSM maps, but just for a specific region. When reducing the size to just SF, the end result data from the export ended up being a couple of MB. Super manageable.
From here on out, massaging the data was easy. It was a matter of pulling up the Nominatim service, loading in the data from bbbike, and writing a small script to query each of the Citation Locations
and get a geometry object back. The script takes a while to run (couple of minutes), but it’s able to output some stuff (more on this later).
Visualizing Data
Visualizing was really cool. I had recently used kepler.gl, Uber’s open-source data viz tool which was introduced to me by a friend from the company when we were working together on a hackathon project. I thought it was visually appealing and wanted to make use of it. Kepler is great at maintaining a straightforward UI, without hindering the end user’s customizability (too much).
All I had to do was take the massaged data, toss it into kepler, configure the vizualization a bit, and export it out. Here’s what I did:
- Fill Color for nodes represented the time of day at which a ticket was given
- Stroke Color for nodes represented the type of violation
- A gradient was set up for the fill color so that more variation could be seen
From there, kepler allows you to export your entire map as an embeddable HTML. I grabbed that portion, and put it up on my site (although there’s something currently wrong with the 3.1alpha release of kepler, causing exports to break).
Results and Random Notes
At the end of the day, everything worked! I was able to find a street I felt more comfortable parking on while waiting for my permit to come in, even though it was a short-lived need.
I think it’s really interesting to see where the violation description locations differ. For example, with the MTR OUT DT
violation, you can clearly see where the busiest parking meters are (both for use and citations). It’s most clearly seen in the marina area, where both the horizontal parts of Chestnut and Union are seen, but also notably, the North-South part of Fillmore St that connects the two. Similar interesting pieces can be seen for the other violation types.
It was cool to see the times at which citations get issued. I specifically noticed that in Dogpatch, citations were generally given out during the evenings / afternoons, and I wonder if there’s a correlation with Warriors games. On the flip side, my new neighborhood usually gets citations in the mornings, meaning that if you need to move your car, you should probably do so when you wake up.
Not everything is roses, however. Nominatim did a great job, but sometimes data and/or processes can be dirty. Some addresses never found a true spot, and you may find some “outliers” (aka incorrect nodes) on the map. Since this is an informal analysis, I decided to just ignore them, as most of the data seemed reasonable at first glance.