After looking up the most common for-hire vehicles used for Uber, Lyfts, and Vias in Chicago, I immediately wanted to scroll to the other end of the spectrum and find the oddball cars used to move passengers from A to B in the Windy City. But while doing so, I noticed something peculiar about the data set: It’s sloppy as hell.
As I wrote yesterday, the Camry is the most popular car registered for FHV service (duh). And the vast majority of those Camrys appear to be registered properly. But some of them are not. For example, here are 15 “Acura Camrys”:
Followed by a bunch of Chevy, Honda, and Ford Camrys:
There are also a number of Hyundai, Jaguar, Land Rover, Lexus, Mazda, Lincoln, Nissan, Pontiac, and Scion “Camrys.”
This same pattern holds true for pretty much every popular vehicle in the dataset. It seems a bunch of people randomly (or, I suppose, mistakenly, although who the hell could think a Camry is a Land Rover?) select a vehicle’s make and/or model when filling out this data set. Here’s the various “Altimas” for example:
It goes on like this, for pretty much every make and model.
It gets worse. There are some vehicles in the data set that have reportedly completed trips in Chicago that have no right to be in the United States at all.
- Toyota ceased manufacturing the Carina in 2001 and never sold it in the U.S., but that didn’t stop the little engine that could from completing 1,163 ride-hail trips in Chicago. Every Carina listed working in Chicago was listed as a post-2001 model. That just made me think, actually, and I did a search.
- A 2010 BAW Lu Pa completed 90 trips despite, to the best of Google’s knowledge, not being a car that exists. It’s possible this is a cascading series of typos for “VW Lupo,” a subcompact hatchback never sold in the United States, but I think there’s one better. BAW is a Chinese manufacturer; we think they’re referring to the BAW Luba, a Land Cruiser clone sold in Russia under the name “Land King.”
- Two Proton Perdanas apparently roamed Chicago’s streets for fares, an amazing feat considering the Malaysian model is for right-hand-drive only and only one left-hand-drive model has ever been exported (as a gift to Turkish president Recep Erdoğan).
We found dozens of examples of this, but with the more than 4.8 million records, we probably missed some. (This dataset goes back to 2015 and if a vehicle is reported for multiple months, it gets multiple records, one for each month. That’s why there are so many records.) Please feel free to dive in and poke around yourself.
When asked about this peculiar noise in the massive data set, the City of Chicago said they post the data as they receive it from the three licensed Transportation Network Providers (TNP) in Chicago: Uber, Lyft, and Via. The three ride-hail companies compile the data in accordance with the TNP reporting manual, send it over, and then it’s uploaded to the open data portal untouched. Which means the ride hail companies themselves have got to be providing the sloppy data.
Unfortunately, there’s no way to know for sure which one(s) is responsible for the sloppy data. The data set only reports if a vehicle is registered with multiple TNPs—unsurprisingly, none of the error vehicles are—but does not indicate which TNP provided the data for that specific vehicle.
Jalopnik sent inquiries to Uber, Lyft, and Via about this issue. Lyft and Uber responded the same way: spokespeople pointed out that they’re not the only ones who provide data and otherwise pointed Jalopnik to their company’s respective (but very similar) vehicle requirements: 15 years old or newer, four doors, no damage, and so on.
Yet, there are records for 62,345 cars completing trips despite having a model year of 2004 or older (again, a car working multiple months will get multiple records, so the number of actual cars is considerably lower, but still much higher than zero).
Drivers are also required to upload vehicle registration and inspection documents, which presumably would not list the car as being the wrong make/model combination.
For its part, a Via spokesperson told Jalopnik that their database does not have these make/model combinations and their new driver registration flow doesn’t allow for make/model mismatches. Unlike Lyft and Uber, the company an approved list of vehicles that must be 2010 models or newer with leather or fabric seats. They also require the same documents as the other ride-hail giants.
For now, we’re left to guess how this happens. But it’s not exactly reassuring that the companies responsible for reporting driver information, conducting background checks, and other vital measures to protect passenger safety properly are submitting such sloppy data.
If anything, this only further supports the need for open data and rigorous oversight of these companies so the public can keep them honest. Maybe this is just sloppy data practice but innocent at root. Or perhaps it’s indicative of a larger problem with vehicle and driver registration. There’s only one way to know, and it’s not a veil of secrecy.