Brittle tests
The article discusses what constitutes brittle tests in the context of design systems and web components. It explores how tests fail unexpectedly when implementation details change rather than actual functionality.
It‘s that time again to dive back into a discussion I had at work a while ago and turn the debate loose on the internet. This article comes directly from a discussion my partner-in-crime Tech Lead and I were having in terms of the best way to support our design system consumers when testing their apps using our design system web components. Shocking no one, we had differing opinions on what constitutes a brittle test, though we both agreed we didn’t want our consumers writing them.
So let’s get to the bottom of what brittle tests are, shall we? Spoiler: I still don’t know. After you skim this article, lets continue the discussion over on Bluesky I think the simplest definition of a brittle test is that it fails when you don’t want it to, or when you don’t expect it to. We’ve all seen flaky tests that depend on third-party systems or APIs and sometimes those systems are down when we’re trying to run our tests and push releases to production. Its why there are whole companies devoted to mocking test data and whole testing strategies designed to help mitigate test failures caused by integrating disparate systems. But in my design system, we don’t really have any third-party dependencies or services, so the type of test we picture is pretty standard. We pictured devs pulling our design system components into their applications, then running unit tests and expecting their applications to behave properly with and around our web components. The fact that our design system is made of web components and not framework components is particularly relevant here. So let me explain the perspective that my coworker and I each had. My coworker’s idea of a brittle test is one that needs constant updating whenever implementation details change in the application. His idea of "brittleness" is that the test should only be testing the desired results, such as the proper text rendered to the screen without any knowledge of the particulars about how the text actually got rendered to the s