As I was pulling the information about the agencies, routes and stops from the 511 api, I thought about how I want to actually process my data – do I want to have to call my api’s every time someone uses Rideminder? nah. I currently have it set up that way, but now that I have more than 2 weeks to work on it. I’m thinking I’ll save the ‘saveable’ bits. Like the stops and the different routes.
So I starting figuring out the relationship between agencies, routes and stops – started my data model.
- Agencies have many Routes, Routes have one Agency
- Routes have many Stops, Stops have many Routes
I started by edit the code that I had already written to get the data from 511 and see how I can use dictionaries to carry the information from ‘Agencies’ to ‘Agencies and their Routes’ to ‘Agencies and their Routes and their Stops.’ Thats when I realize that I’ll have duplicate stops – so I decided to also have another dictionary that combines all the stops with their stop codes by agency and made it into a set (to get rid of the duplicates). During the process, I also checked to make sure that if Route X and Route Y use the same stop, that the stop codes were the same (trust, but verify).
And they were! This was fantastic, there wasn’t even one that didn’t match up (which bother me to the point where I was trying to break it a few more times, just to make sure). Now I have a dictionary of all the route name and it’s value is another dictionary of their direction, route_code, agency and a list of stops — its beautiful!
Now I need to get the stops, but wait! I have a many to many relationship. So I have to make another function to get all the unique stops, so I looped over all the keys in my stops_routes_agencies_info dictionary and made a new dictionary with the keys of the name of the stop and their stop code and it’s value be another dictionary of lat, lon!
Well, the lat and lons were empty because 511 doesn’t store that information. So I had to look up different apis:
- I had to go through each route and get all the stops with their lats and lons, so I set up a unique_muni_stops_lat_lon dictionary that has the stop name as the key and a dictionary of the lat, lon and stop_id as value. I originally had they as the stop_id, but lo and behold, 511 and Nextbus don’t use the same code/id!
- BART: Bart
- I was able to go through the stop name, but Bart abbreviated their names for the api call. So I just made a dictionary of the name that I had in 511 and the value was the abbreviated name for Bart.
- Caltrain: CSV from Caltrain
- There wasn’t an api, so I downloaded their CSV and got the information that way, this by far the “easiest” one
It took a bit to figure out how to make the calls and get the information I needed, in the way I needed them. I had to parse out XML for MUNI and BART. Thank goodiness for Element Tree (module). Each one required a different way of handling it’s information, but overall I was able to come out with a nice dictionary of ALL the unique stops (3673 stops!).
Now that I have these pieces, I had to figure out if I was able to match all the data and see what I am missing. This required some trouble shooting.
I ran my different dictionaries together so I can find how many stops did not match (between the stops and the routes’s stops) – I named this dictionary ‘sad’. The first time I went through it I had 3552 sad tops that didn’t match. So I played around and I figured out that MUNI stop id/code are not the same as 511, so I changed that and then I my sad went to 3508. Better, but not enough. So it turns out that 511’s name uses “and” while Nextbus uses “&” in their stop names. So I used .replace() on my Nextbus data and bam! Sad went down to 348! At this point, thats 10% of the stops. So I decided put it on my todo list for the next version.
Now it was a matter of finishing my data model. I had started my basic idea from the photo above, but my apis gave me different information and after playing with the data I was able to see what data points I need. So I had to tweak it a bit and I ended up with this:
I was able to add all my babies in my database:
I am super excited to have this part done. Now I can work on figuring out how I want to be able to track my user’s transit vehicle without the vehicle id (like before). My general idea is: