Over 12,000 Tesla Model Ss and Xs built between 2012 and 2018 have had failures of their Media Control Units; the computer system that runs the big center-stack touchscreen and controls all kinds of things, from rearview camera displays to some HVAC functions to even the little clicking sound when the turn indicators are on. These units have been failing at an alarming rate, making cars almost undrivable and costing thousands of dollars to repair. The NHTSA has started an engineering analysis, and it seems that it’s all because of a questionable design.
The issue is quite well-known in the Tesla community and has been covered by other media outlets like our pals over at Ars Technica. Fundamentally, this is what the issue seems to be: in the MCU there is something called an 8GB eMMC NAND flash memory module.
This is essentially the same sort of flash memory as you may have as the main storage device in your computer or as a little USB flash drive. These units are fast and work extremely well, but they have a limited number of times they can have data written to them.
For most of the contexts we interact with flash memory, like with USB thumb drives, we’re never likely to come anywhere near the write cycle limit of the device. But, for embedded systems where data is being written and read automatically, over and over, this limitation becomes a much greater issue.
Figuring out the lifespans of such memory chips is a big deal, and seems to be at the root of what Tesla misjudged.
The chip in the MCU has log files written to it on a regular basis, and Tesla’s original estimates expected the unit to last between 11 and 12 years, what they considered to be a reasonable life of the car.
That hasn’t proved the case for many cars. As the NHTSA report states (emphasis mine):
On June 22, 2020, the Office of Defects Investigations (ODI) opened Preliminary Evaluation PE20-010 to investigate incidents of media control unit (MCU) failures resulting in loss of rearview camera in model year (MY) 2012-2015 Tesla Model S vehicles equipped with the NVIDIA Tegra 3 processor with an integrated 8GB eMMC NAND flash memory device. EMMC NAND flash devices have a finite lifespan based upon the number of program/erase (P/E) cycles. The subject MCU allegedly fails prematurely due to memory wear-out of the eMMC NAND flash. Tesla used the same MCU with the Tegra 3 processor in approximately 159 thousand 2012-2018 Model S and 2016-2018 Model X vehicles built by Tesla through early-2018. In response to ODI’s Information Request (IR) for PE20-010, Tesla provided ODI with 2,399 complaints and field reports, 7,777 warranty claims, and 4,746 non-warranty claims related to MCU replacements. The data show failure rates over 30 percent in certain build months and accelerating failure trends after 3 to 4 years-in-service.
The likelihood of failure seems to be correlated with how much the car has been driven since that’s when logs get written to the flash memory, though this is not always the case with the failures, which can vary pretty wildly.
Here’s more description from the NHTSA report:
According to Tesla, for subject vehicles equipped with the NVIDIA Tegra 3 processor with an integrated 8GB eMMC NAND flash memory device, the eMMC NAND cell hardware can fail when reaching lifetime wear, for which the eMMC controller has no available blocks to recover.
With this failure mode, the only recovery available is a replacement of the eMMC device, achieved by physical part replacement of either the MCU assembly or visual control module subcomponent. Tesla provided the effects of MCU failure on vehicle function which result in loss of rearview/backup camera, loss of HVAC (defogging) setting controls (if the HVAC status was OFF status prior to failure.)
There is also an impact on the advanced driver assistance support (ADAS), Autopilot system, and turn signal functionality due to the possible loss of audible chimes, driver sensing, and alerts associated with these vehicle functions.
There are precedents for addressing defects that result in loss of either backup camera, defogging, or turn signal functions under Investigation: EA 20-003 Open Resume Page 2 of 2 safety recalls. Tesla has implemented certain Over-The-Air or OTA updates to subject vehicles to mitigate the effects of MCU failure.
These updates include firmware changes to reduce memory usage of the subject memory card, improve eMMC error correction and storage management strategies, changing the control logic for turn signal activation, and defaulting the HVAC system to Auto (71.6F) for drives after MCU failure to address windshield defogging. Tesla indicated that the MCU failures are likely to continue to occur in subject vehicles as vehicles continue.
Tesla has only repaired these by swapping the entire MCU, which can cost between $2,000 and $5,000. The basic issues are shown well here in this video:
In there, you can see that the memory chip is not removable — this could be because the permanent ones are generally a bit better at dealing with vibration and harsher environments, but it’s not really clear, as there’s another flash memory unit used to store the logfiles Tesla service uses, and that one is stored on a removable SD card:
To Tesla’s credit, earlier this month (November 9, three days before the NHTSA started their engineering analysis and about five months after the NHTSA opened their investigation) it finally announced a warranty adjustment program for people who have had cars affected by this issue.
Refunds could also be available to anyone who paid to have the unit replaced prior, so if that’s anyone reading this who enjoys using money to exchange for goods or services, you should probably look into that.
Here’s what I can’t stop thinking about, though: why did this have to happen? Telsa knows this flash memory should be thought of as a consumable since it has a limited number of write cycles and they knew they would be writing a lot of data to it as the car drove, so if you know that, why design such a part to be integrated into an expensive component?
They didn’t do this for the Gateway Log memory module, that removable SD card; why wouldn’t that have worked for this? Or why wouldn’t some sort of volatile buffer dynamic RAM have been used, and that would get written to the flash at regular, known intervals, allowing for much more control over the life of the chip?
I’ve reached out to Tesla to ask these questions, but, of course, the chances of them writing me back are roughly the same as the chances of me finding a Kruggerand under my tongue.
I know the engineers at Tesla are smarter than me, and I know many of you in the comments are smarter than me, too. I’m hoping a smarter person will explain why a consumable component like that, one that could directly affect the use of the car significantly in case of failure, would be made so expensive and difficult to replace.
There’s gotta be a reason, right?
At least Tesla seems to be helping with the warranty claims for this. It’s a little late, but it’s very welcome.