CrowdStrike Outage Explained by Keith Barker CCIE

hello and welcome my name is Keith Barker and I'd like to give you a high Lev overview of what you need to know about the crowd strike incident that happened in July 2024 and these are the three main questions I'd like to cover with you right now first of all what happened secondly why did it happen and third and fairly important to people are still cleaning up after this how do we resolve it so let's begin with question number one what exactly happened to over 8 million Windows computers and are you impacted first of all it would look like the result looks like this this is a representation of the blue screen of death or as his friends call it BS o d for the acronym for blue screen of death and for the question of are you impacted the answer is yes even if you personally didn't have a blue screen of death on your Windows computer it's very likely there was interruptions to other services that you very likely were attempting or were going to use things like businesses and Airlines and hospitals and various critical services around the world so I have friends who Miss flights I had children that missed doctor's appointments all because those systems were down due to their blue screen of death so again whether this happened to you personally or you were impacted by somebody else's computer systems having blue screens of death this impacted hundreds of millions of people all over the planet in one way or another so when something like that happens one of the questions might be well how in the world did this occur so let's tackle that next and a great way to understand why this happened would be to use the analogy of a castle so let's imagine we have a castle right here so this would be the perimeter and then within the castle there's other secure areas it's very likely that's where we keep the king or the royalty whoever it happens to to be in the most secure area so we'll call this area zero the most secure area and the Outer Perimeter here we'll go ahead and call that area one so we can think of this area zero here in the castle the innermost part of it as the most secure area now if somebody needs access to the king or to the staff of the king or to the royalty what can happen is a request can be made for that access maybe we need to get a decision from the king or some other request needs to be made that request is made and then inside this most secure area the decisions are made and then the results are handed back and the concept is if something negative happens out here in area one let's say we have I'm going to go ahead and draw an X to represent something negative happening hopefully that's not going to impact the area zero because there's extra security to get to area zero the goal is to not let those negative events impact the area zero the most secure area where the king and the treasures are all kept or in the case of a queen where the queen and all the treasures are kept so how does this story of the castle with a most secure area and a less secure area apply to why this blue screen of death happened as part of the crowd strike incident in July of 2024 and here's how it applies I'm going to draw the same diagram again except this time I'm going to call this ring one and think of it like area one for the castle the outside perimeter but in a computer system it's referred to as ring one and then the operating system and the most critical functions are going to be running in a separate and more secure area called ring zero and just like our analogy of the castle and area one and area zero inside of a computer system we have these similar areas except they're referred to as the most areas here's ring zero and the less secure area or the outside perimeter here is referred to as ring one and here at ring zero the operating system is handling core functions for the operating system itself so when we have other applications such as Microsoft Office applications or other programs or browsers that we're running they're running in ring one it's also referred to as user mode so these little red boxes I'm going to refer to as applications think of them like user applications that are running in ring one now for those applications to work they need resources for example they need to get to memory or they may need to write to the dis or they may need to write to the network or make requests from the network so when those apps need resources they're interacting and making this request to the ring zero components and right here at ring zero we have things like memory management and access to the hardware and all the super secure services that the operating system is in charge of and that would also include things such as drivers so when an application needs resources it's making those requests and then hopefully those requests are being granted back to the applications from the operating system another common term for ring zero and ring one are kernel mode and I'll match set the same color here and just think of Kernel mode as the most secure area of the operating system that gets direct access to the hardware and resources and again that's where the operating system runs and drivers run in ring zero or in kernel mode and ring one where most applications run that's referred to as user mode some parentheses I'll just shout out ring zero for kernel mode and next to user mode I'll go aad and put ring one just as a reminder now what's the benefit of having two of these different modes kernel mode at ring zero and user mode well the benefit is if we have an application that goes sideways it has a problem or an issue hopefully we just want that application to die by itself and not take down the entire system and that's normally the case with applications that are run in user mode for example let's imagine we're running our favorite network-based game on our computer it's running in user mode in ring level one and it crashes the intent is for just that application to crash because of its problem and not to take down the entire operating system and get a blue screen of death so let me clean this up just a teeny bit and let's talk about why the Falcon application let's take a look at what that is why that from crowd strike caused the blue screen of death crowd strike makes some software that runs on a Windows computer that helps to identify and prevent malware so think of it like an anti-malware program on steroids it's really really efficient until it brings down the computer but for the moment let's just go ahead and label out that their product called Falcon again think of Falcon like an antivirus or antimalware software and as far as the impact goes the blue screen of death happened to individuals and computer systems where they were running the Falcon service from crowd strike now the reason that the Falcon software caused the blue screen of death was because had two things that were currently working at the same time number one the Falcon software is running in kernel mode here at ring zero so if there's a problem with the software and it ruins effectively the kernel that's going to cause the blue screen of death instead of the computer still trying to continue which if there's problems at ring zero with access to memory for example two different programs right into the same memory space or walking on top of each other the operating system is designed to Halt instead of continuing which could lead to further data corruption so their code the Falcon code is being run as a device driver here at ring zero in kernel mode and there's probably some great reasons why they're doing that one would be they want more access and direct access to make sure that their antimalware and anti virus Etc software is currently working correctly and able to go ahead and really catch everything that's trying to happen and it's not like they just walked up and said hey can we do this with Microsoft they went through a process called whql which is an acronym for Windows Hardware qualified lab and they were certified which effectively means that Microsoft tested and worked with their software validated it and said yep you're good to go we approve of this and here's the rub even though the Falcon software was certified through whql so let's go ahead and manage this is Falcon right here so it's been certified it's been signed by Microsoft that it's good to go the underlying components of Falcon periodically get updates and that's where the whole thing went South so let's imagine the Falcon software itself has some subordinate files think of them like files that the Falcon software itself uses as part of its operation and let's go ahead and put file one and file two and file three and so if they need to update some components of the Falcon driver their software effectively what they can do is update those files and then they have an updated component as part of the Falcon software and that's deployed to clients that are using this system from crow called Falcon through Dynamic updates well even though the Falcon software was certified by Microsoft in the event that they do an update and they have a corrupt or incorrect file that they're using as part of the Falcon software because it's running in kernel mode at ring zero if there's a problem with one of those supporting files that's being used by Falcon that could and did cause a problem as part of the update that happened on July 19th with these files that were being used by the Falcon driver which again is our code wearing at ring zero in kernel mode it can be identified as C- then 5 Z 291 Das and then some extension which won't matter tooo much because it's this update that caused the problem so to answer the question why did this happen customer systems that were using crowd strikes Falcon software when they got the dynamic update the update had an incorrect file and as a result the incorrect file caused the application to fail and because it was at ring zero or kernel mode that caused the blue screen of death so next let's turn our attention to how it is being resolved now they can push out a new update and they have since the actual incident happened but if a computer boots up to the blue screen of death it's not going to be able to continue with any other kind of update because at that point it's halted and the process for correcting this if you're sitting at the computer that has a blue screen of death would be to reboot the computer into what's known as safe mode and then in safe mode you drill down to the file structure find any of the updated files from Falcon with this 00000000 291 and then delete that file or any files with that 291 update and then go ahead and reboot and that works great most of the time except for the fact that a lot of servers don't have a guei connected to them for example a GUI is a graphical user interface a lot of servers don't have a screen attached to them all the time and as a result if you have hundreds or thousands of servers that are running in a Data Center and they don't have screens that some you can just walk up to and work with it may take a little more time or scripting to actually make that change another complication comes in if we're using a security feature on the file system called bit Locker so if a system is using bit Locker there are some additional steps and depending if you have the keys or not additional steps beyond that to get fully recovered just be aware there is some additional work to do if you're currently using bit Locker also as part of the recovery process to Aid that Microsoft updated their Microsoft recovery tool on July 22nd and that recovery tool provides two repair options to help it admins expedite the repair process so one of those options is booting from wind PE to facilitate the repair and the other option is doing the recover from safe mode and I guess the final thing we should chat about is how could this be avoided and the answer is QA so even though the Falcon driver or the Falcon code itself which was running at ring zero is certified via whql from Microsoft the updates that they were doing obviously were not thoroughly tested enough to prevent the blue screen of death so the two solutions would be one is better QA on the updates for the Falcon software or secondly they could run the Falcon software not in ring zero in kernel mode but rather run it as an application which might degrade some of its deficiencies in identifying malware but at least if it crashed it would only crash itself and it wouldn't cause the entire system because it's not a kernel zero application it wouldn't cause the entire system to crash so thanks for joining me in this video as we've addressed three elements number one what happened number two why it happened and third how it's being resolved and until next time I'm Keith Barker and stay safe

Share your thoughts

Related Transcripts

NxtWave CCBP 4.0 Student Interview Experience | Natwest Interview Questions | 2024 | Don’t Miss thumbnail
NxtWave CCBP 4.0 Student Interview Experience | Natwest Interview Questions | 2024 | Don’t Miss

Category: People & Blogs

Hello i'm ied my graduation from college of engineering in the stream of science and technology computer science and technology i am i am familiar with programming language like python html css java javascript ands i have hands exp with these languages as well some of the projects some of my projects... Read more

Élections nationales de 2024 en Thuringe (FR) thumbnail
Élections nationales de 2024 en Thuringe (FR)

Category: Education

Dans cette vidéo il est question du résultat des élections régionales de 2024 en turinge je t'explique le résultat des élections et ses conséquences en 2024 l'afd dirigé par bjorneck remporte l'élection régionale en turinge avec plus de 30 % des voix c'est la première fois qu'un parti d'extrême droite... Read more

Osttimor einfach und kurz erklärt thumbnail
Osttimor einfach und kurz erklärt

Category: Education

In diesem video geht es um osttimor ich erkläre dir wo ostimor liegt und was du sonst noch über das land wissen solltest osttimor liegt in südostasien und teilt die insel timor mit indonesien es hat eine fläche von etwa 15 000 quadratkilm die nordküste grenzt an die timorsee die südküste an die savusee... Read more

First Female vice President? #viralvideo #elearning #shortsfeed #shorts #shortvideo #education #gk thumbnail
First Female vice President? #viralvideo #elearning #shortsfeed #shorts #shortvideo #education #gk

Category: Science & Technology

ू हैज मेड हिस्ट्री बाय बिकमिंग द फर्स्ट वुमन वाइस प्रेसिडेंट ऑफ द यूनाइटेड स्टेट्स ऑफ अमेरिका योर ऑप्शंस ए मार्गरेट हिंस बी एलेना रूजवेल्ट सी सुची ट्राप डी कमला हैरिस करेक्ट आंसर इज डी कमला हैरिस Read more