October 7, 2013 Washington Post Article by Tom Lee
This Reuters article about Healthcare.gov has been getting some attention. Alas, it’s not very good, focusing on client-side optimizations that are probably unrelated to the federal health care Web site’s early woes. Healthcare.gov’s problems are almost certainly occurring at a deeper level in the system, making it very difficult, if not impossible, for an outsider to gauge how serious those problems are.
To explain, let’s do one of those analogy things. Say that Kathleen is planning a birthday party for herself.
There are a bunch of tasks associated with the party that need to be done. For instance, guests have to be told where and when the party is and whether to bring gifts. This is a pretty easy task to manage: Kathleen prints up a bunch of flyers with the relevant information and asks some friends to hand them out.
This task can be done well or poorly, of course. Maybe she foolishly printed bits of information on different pieces of paper instead of on a single flyer. Maybe she only asked one friend to hand them out and he’s a flake. These could become real issues if more people than Kathleen anticipated want to attend the party.
These are easy problems to solve, though. Printing more flyers is simple. You can hire people to hand out the flyers if your friends aren’t reliable. There’s no real need for these distributors to coordinate.
Some tasks require Kathleen herself, though. Receiving happy birthday wishes, for instance: There could be a huge number of guests, but there’s only one Kathleen. If she doesn’t plan for this properly, she could wind up being too busy receiving congratulations nonstop to enjoy the party. Perhaps her guests will have to waste their time queued up waiting for her, too.
Many Web application optimization problems can be categorized in a similar way. Some processes can be run in parallel, without central coordination. These processes might be implemented wastefully or unprofessionally, but you can usually fix them by throwing more resources at the problem. Cloud-hosting architectures often make this trivially easy.
Other problems require coordination or centralization. That can cause bottlenecks, and they can be quite severe. You can respond by rewriting, redesigning, tuning or, yes, throwing more resources at the affected systems. Sometimes this works and sometimes it doesn’t; it requires time and expertise, though, not just a credit card and an Amazon account. Sometimes your only real option is to design around these problems: Queue the expensive tasks for later execution, or accept a loss of synchronization across your system.
The Reuters article spends a lot of time on how many static resources are loaded into the browser by Healthcare.gov. Sometimes there are good reasons for loading a bunch of that stuff and sometimes there aren’t. The fact that there’s usually room for improvement — as any Web optimization tool will tell you — means that it’s pretty simple to make a critique of virtually any site. That doesn’t make the bugs and glitches critical problems, however.
Besides, the symptoms that usually show up with this class of problems are different than the ones afflicting Healthcare.gov. And many of the Healthcare.gov assets in question are served through the Akamai Content Delivery Network, which is probably the best-known brand name when it comes to making sure your servers can handle gigantic amounts of static asset requests.
Parts of Healthcare.gov are down right now, presumably under technical maintenance. Hopefully they improve the system throughput. Traffic is likely to even out after the initial crush of applicants, which should also help. Before long, I suspect that the site will work just fine.
It’s unfortunate that Healthcare.gov hasn’t made a great first impression. But it still has time to get things right. Once it does, there’ll be lessons to be drawn. But they’re probably not going to be ones you can generate automatically from a browser plugin.