The CrowdStrike fail and next global IT meltdown already in the making

By Finance News On Jul 21, 2024

CrowdStrike CEO on global outage: Goal now is to make sure every customer is back up and running

When computer screens went blue worldwide on Friday, flights were grounded, hotel check-ins became impossible, and freight deliveries were brought to a stand-still. Businesses resorted to paper and pen. And initial suspicions landed on some sort of cyberterrorist attack. The reality, however, was much more mundane: a botched software update from the cybersecurity company CrowdStrike.

“In this case, it was a content update,” said Nick Hyatt, director of threat intelligence at security firm Blackpoint Cyber.

And because CrowdStrike has such a broad base of customers, it was the content update felt around the world.

“One mistake has had catastrophic results. This is a great example of how closely tied to IT our modern society is — from coffee shops to hospitals to airports, a mistake like this has massive ramifications,” Hyatt said.

In this case, the content update was tied to the CrowdStrike Falcon monitoring software. Falcon, Hyatt says, has deep connections to monitor for malware and other malicious behavior on endpoints, in this case, laptops, desktops, and servers. Falcon updates itself automatically to account for new threats.

“Buggy code was rolled out via the auto-update feature, and, well, here we are,” Hyatt said. Auto-update capability is standard in many software applications, and isn’t unique to CrowdStrike. “It’s just that due to what CrowdStrike does, the fallout here is catastrophic,” Hyatt added.

The blue screen of death errors on computer screens are viewed due to the global communications outage caused by CrowdStrike, which provides cyber security services to US technology company Microsoft, on July 19, 2024 in Ankara, Turkey.

Harun Ozalp | Anadolu | Getty Images

Even though CrowdStrike quickly identified the problem, and many systems were back up and running within hours, the global cascade of damage isn’t easily reversed for organizations with complex systems.

“We think three to five days before things are resolved,” said Eric O’Neill, a former FBI counterterrorism and counterintelligence operative and cybersecurity expert. “This is a bunch of downtime for organizations.”

It did not help, O’Neill said, that the outage happened on a summer Friday with many offices empty, and IT to help to resolve the issue in short supply.

Software updates should be rolled out incrementally

One lesson from the global IT outage, O’Neill said, is that CrowdStrike’s update should have been rolled out incrementally.

“What Crowdstrike was doing was rolling out its updates to everyone at once. That is not the best idea. Send it to one group and test it. There are levels of quality control it should go through,” O’Neill said.

“It should have been tested in sandboxes, in many environments before it went out,” said Peter Avery, vice president of security and compliance at Visual Edge IT.

He expects more safeguards are needed to prevent future incidents that repeat this type of failure.

“You need the right checks and balances in companies. It could…