CrowdStrike Update: Latest News, Lessons Learned from a Retired Microsoft Engineer

Published: Jul 23, 2024 Duration: 00:17:25 Category: Science & Technology

Trending searches: microsoft outage today
hey I'm Dave welcome to my shop I'm Dave plumber a retired Microsoft software engineer starting our Windows back in the early 1990s and today I'm going to update you on all the latest fulcon news as well as some want and speculation and even conspiracy theories on the crowd strike Falcon it oage if you watch my last video then you already know the specific technical details of what precisely went wrong so I'll only briefly update them here with some new info once we've done that I'll update you on the latest conspiracy theories as as well as consider what broader lessons can be learned from the whole debacle the recent crowd strike it outage was caused by a faulty sensor configuration update in their fulcon cyber security platform here are the key technical details the update involved a configuration file known as Channel file 291 designed to Target newly observed malicious named pipes used in common command and control Frameworks the update appears to have been malformed it then triggered a logic air in the crowd strike kernel Drive that resulted in system crashes in the infamous blue screen of death on impacted Windows systems approximately 8.5 million devices worldwide were impacted causing significant disruptions across various Industries including Banks Airlines and businesses even 911 service was disrupted in some areas Crow quickly identified the issue and deployed a fix within a few hours they issue detailed technical guidance for affected customers including mitigation steps and tools to identify impacted hosts now I used air quotes around the word fix because in this case the fix only fixes the update and prevents more machines from being brought down for the 8 million or so machines that already took the update it does nothing at all to fix them that's going to be up to the system administrators office managers and nerdy uncles around the world to fix because each and every machine will require that a human manually boot the machine into safe mode from there you have to find the corrupted Channel 291 update file in the crowd strike folder delete it and reboot and so that's where we're at a whole lot of tech standing around with their disc in their hand waiting to Safe boot 8 million blue screen Windows machines doesn't look very good for Microsoft which is ironic because it's primarily a crowd strike issue and not something specific to Windows itself if you don't believe me consider that on April 19th this year Crow strike issued a flawed update that impacted customers running Debbie and Linux the update caused those systems to crash and prevented them from rebooting normally the issue was acknowledged by crowd strike the next day but it took weeks to determine the exact cause and Implement a fix another similar issue occurred a month later on May 13th this time affecting Rocky Linux these servers experience freezes after upgrading to the rocky Linux 9.4 this problem was linked to a Linux sensor operating in user mode combined with Pacific 6.x Kel versions curiously absent from the list though is the Mac like a lot of folks you might just assume that's because it's yet one more piece of software that doesn't even run on the Mac but you'd be wrong crowd strike does provide security solutions for Mac OS through its Falcon Plus platform the Falcon sensor for Mac OS does not install kernel extensions especially with the release of Mac OS big sir and later versions where Apple deprecated the use of K extensions entirely instead crowd strike has rearchitecturing workk provided by Apple known as system extensions and while I generally hold Microsoft blameless in how crowd strikes mistakes manifested on their platform this time around it all comes down to the fact that a kernel driver is involved at all as I explained in the last video a kernel driver has very intimate access to the system's most inner workings as a cost however it brings with it the fact that if anything goes wrong with the kernel driver the system must blue screen to prevent further damage to the user settings files security and so on crowd strike engag is in the risky business of delivering kernel code to the critical path of millions of machines not because they are careless YOLO Cowboys or even in spite of that they do it because it's the only way on Windows to get the low-l system access to do the security Voodoo that they do you see code gets to walk on the wild side in the kernel usually for one of two reasons either for performance reasons or because it needs access to information about or other kernel goings on that it simply cannot do from user mode back in the day when my beard was still dark red as in the days of Windows n31 not even the video driver ran in kernel mode it essentially ran entirely in user mode and when it needed to access the hardware it would be done by a proxy thread in the kernel on behalf of the video driver and the parameters and results will be validated and Marshal back and forth between those threads the problem is that with a gen 4x6 GPU connection that's a metric crap ton of data to Marshall and it'd be a lot faster if the driver just had Direct access to the hardware and so for performance Reasons video drivers got moved into kernel space but the key point is that it was not a necessity it was a performance decision made at the cost of potentially reduced reliability oh over time the decision has been made the other way in favor of stability too the original printer subsystem for Windows used a kernel mode driver model for printers and while I would never dare to question the wisdom of printer designers I'm not sure I want some internet brother writing my kernel code and so with a little wailing and nashing of teeth the printer driver model was moved to user mode to make Windows far more robust when it comes to something like crowd strike the Falon sensor is in kernel mode presumably because it needs to do things that can't be done from user mode and to me that's where Microsoft could be responsible because on the Windows platform to the best of my knowledge some of the crowdstrike security functionality requireed deep integration with the operating system that can only be currently achieved on the colonel side that's not to say that Microsoft hasn't tried there's wdac or the Windows Defender application control API there's also the Windows Defender device guard together they provide mechanisms for controlling application execution and ensuring that only trusted code code runs on a system they also offer various apis for antivirus and endpoint protection solutions to interact with the operating system and I don't know to what extent crowd strike those Active network filtering but the Windows filtering platform or wfp allows applications to interact with the network stack without requiring kernel level code the irony of all this is that at one point Microsoft actually tried to do the right thing behind the scenes sources indicate that Microsoft have been working on a solution that could have potentially prevented such disasters the tech giant had developed an advanced API designed specifically for security applications like crowd strikes this API promised deeper integration with the Windows operating system offering enhanced stability performance and security it was a proactive measure aimed at mitigating the risks associated with low-level system interactions which are often fraught with complexities and potential vulnerabilities however as Microsoft prepared to roll out this game-changing API they encountered an unexpected obstacle regulatory body tasked with ensuring Fair competition in the tech industry scrutinized the new API The Regulators in the European Union argued that providing such a powerful tool exclusively to certain applications could give Microsoft an unfair Advantage potentially stifling competition from smaller security firms that wouldn't have the same access now despite Microsoft's assurances that the API would enhance security for all users The Regulators stood firm they feared that integrating this API could create a dependency on Microsoft's ecosystem effectiv L locking out competitors who couldn't leverage the same level of access to the windows core consequently the API was deemed anti-competitive and its implementation was prohibited so allocating blame to Microsoft for inaction on an API is actually pretty unfair Microsoft is also in a very different position than Apple Apple is somehow afforded the luxury of being able to do things like break an entire driver model in a new update that requires everything to be Rewritten conversely backwards compatibility is so deeply ingrained among Microsoft developers that it simply may not be an option on my Mac I've got a universal audio Apollo tnx Thunderbolt sound device and it requires that you disable all of Apple's driver signing and kernel extension security and for weeks the machine would pink screen and reboot until they eventually got their driver more sorted Microsoft needs to support and Export whatever functionality as an official API so that security providers can build their product without putting the entire operating system at risk not because it's the right thing to do but because the harsh reality is that they've got tens of millions of machines serving ad Mission critical roles like 911 service that do run kernel mode code those organizations deserve a system that doesn't need to run thirdparty kernel code to safely do its job and only Microsoft can fix that but only if The Regulators would let them now I'm certainly not going to throw satcha under the bus for not throwing crowd strike and the EU under at first but I question the communication and messaging that's coming from the top the decision to not publicly note that this isn't a failure in Windows itself has led to to the widespread misconception amongst my friends and relatives that it was a Windows update that went horribly wrong I think it' be instructive to take a quick look at another PR nightmare that also wasn't the company's fault Tylenols crisis back in the 1980s now that might sound like a long time ago but keep in mind I'm almost 56 now damn I'm sorry anyway Johnson and Johnson faced a crisis that would become a defining moment in corporate crisis Management in September 1982 seven people in the Chicago area died after ingesting Tylenol capsules that had been laced with cyanide this event triggered Widespread Panic and could have easily destroyed the trust and credibility of the Tylenol brand entirely to say the least James Burke the CEO of Johnson and Johnson at the time spearheaded a response that would set a new standard for corporate crisis management his approach was characterized by transparency decisiveness and a focus on consumer safety as soon as the tampering was discovered Burke ordered a nationwide recall of Tylenol products totaling around 31 million bottles and costing the company over $100 million this decisive action underscored Johnson and Johnson's commitment to Consumer safety over their short-term Financial losses Burke made it a priority to maintain open lines of communication with the public the media and Regulatory Agencies he ensured that the company was forthright about the risks and the steps being taken to address the situation this transparency helped to build trust with the public during a time of fear and uncertainty in the aftermath of the crisis Johnson and Johnson introduced tamper evident packaging which became an industry standard this move not only addressed immediate safety concerns but also restored consumer confidence in the product the company also launched a major public relations campaign to educate the public about new safety measures and reassure them about the product safety Burg's leadership during the Tylenol crisis was widely praised for its ethical Focus he adhered to the company's Credo which emphasize the importance of the company's responsibility to its consumers employees and Community this ethical Foundation guided all of Johnson and Johnson's decisions during the crisis the Swift and responsible actions taken by Burke and his team not only helped Tylenol to recover from the crisis but also strengthened the Brand's reputation Tylenol regained its market share within a year and the company's handling of the crisis became a case study in business schools around the world James Burke's masterful handling of the Tylenol crisis showcased the power of ethical leadership and set a new Benchmark for crisis management by putting consumer Safety First and maintaining transparent communic ation Burke was able to navigate one of the most challenging crises in its history and emerge stronger of course the Tylenol crisis and the crowd strike outage are very different events but I think both Microsoft and crowd strike would be wise to learn from James Burke's example and maybe it's time for a tamperproof colonel all this would require that the EU reway the greater public good in terms of critical infrastructure over competition in the security API business and speaking of trust what about code signing what went wrong here that a fully signed driver was able to bork 10 million Windows machines remember that Microsoft fully tested and vetted and approved and signed the crowd strike driver in the whql lab and the driver didn't change just the channel update file did the channel files are used as input to the driver and we subsequently learned that the channel 291 update file was made up entirely of zeros and then when the driver ingested that update file it choked and because it was in curdle mode its only choice was to then turn blue and D that also means that all of the trusted platform modules and secure boots in the world wouldn't have saved you the driver was already fully trusted so even if you were running locked down to sign bits only the driver never changed data files like Channel updates aren't signed as far as I know so a digital signature wouldn't have helped and even if they were signed an all zero signed Channel file would still likely have crashed the signed driver so in this case trusted Computing was of little help since there have been very few specific technical details made public so far it's time to get a little further into the weeds with some speculation before moving on to some outright conspiracy theories my speculation begins with my assessment of what went wrong inside the crowd strike driver in the last episode we saw how the driver was access violating and crashing the system but why what caused it the best assessment I can come up with is that the code D referencing a null pointer plus an offset into a data structure that is expecting to find in memory why their base pointer for the structure is no is harder to say but it's almost certainly tied to the fact that the channel update file was all zeros a few folks have written to ask me why such code can't just be placed in a tri accept block so that if it access violates operation can continue and the answer is you can in theory and since the exception will be triggered on the attempt to write to Illegal memory and not merely after the fact memory itself is protected and preserved that means that as long as the code with the exception Handler can return gracefully and the callers Upstream can in turn cope with the air being returned back to them all as well but I didn't want to give you the impression that you can just wrap suspect code in a tri accept block and eat the exceptions there's a bit more to it than that I think the real failure here is on the part of the crowdstrike driver in its lack of properly vetting its input they're not great about teaching it in college but one of the first things you learn as a real developer is never to trust user input and if you're a device driver and your input is a dynamically downloaded Channel update file you can't just implicitly trust it even if the channel files were signed by himself the code needs to sanity check the contents let's say you're writing a little app to read in a bitmap file and displayed on the screen using the graphics card when you read that file into memory and pass it to the draw bitmap API the first thing that the API is going to do is to check the bitmap structure and header and make sure that it's all valid and if you pass that bit map off to direct X to render it with the GPU you can rest assured that the kernel side of the driver is going to carefully inspect the bit map for validity in every possible sense before attempting to draw it and crowd strike man not so much looks like their code just kind of raw dogged it and hoped for the best but it is in life as it is in software you can be lucky sometimes but if you come to rely on luck it will eventually run out and crowd strikes appears to have run out when that channel file full of zeros brought down what must be a fairly fragile section of their code following the crowd strike outage various conspiracy theories have emerged on Twitter and Reddit one popular Theory posits that the outage was a deliberate Cyber attack signaling the onset of World War II with some can get to warnings from the world economic Forum about potential Global cyber threats another theory suggests that the oage was orchestrated by political figures to influence geopolitical events although there is no evidence supporting any of these claims as for me I try to never attribute to malice that which can be sufficiently explained by incompetence it's not as simple as one programmer air either though when I was at Microsoft I only wrote the odd bit of Colonel code but the culture among the colonel guys was pretty hardcore the quality bar was extremely high as was a level of scrutiny that your code would receive from the colel team if you wandered under their Turf and checked something into their Source control even so I'm not going to just condemn the programmer especially based on the limited information that we have on the actual bug but regardless of how egregious the bug is or isn't there should be several procedural and tests and review layers that would prevent this bug or any bug from having the impact that this one had there are a lot more lessons to consider here from whether or not seemingly the entire world's infrastructure should be dependent on a single vendor to whether critical systems like 911 need to be on an N minus1 or an N minus 2 update schedule and what that all means and Heaven help you if you are running bit Locker on the affected machine but all that we'll have to wait for a future episode so if you found today's episode to be any combination of entertaining or informative please remember that I'm mostly in this for the subs and likes and I'd be honored if you'd consider subscribing to my channel and leaving a like on the video if you're already subscribed thank you please consider sending this video to a friend if you think it's covered the subject well and please do check out the free sample of my new book on Amazon the non-visible part of the autism spectrum it's intended for folks that don't have ASD but who suspect they might have a few characteristics that put them somewhere on the Spectrum it's everything I know now about living a successful life on the spectrum that I wish i' had known long ago check it out at the link in the video description in the meantime and between time hope to see you next time right here in Dave's Garage

Share your thoughts

Related Transcripts

CrowdStrike IT Outage Explained by a Windows Developer thumbnail
CrowdStrike IT Outage Explained by a Windows Developer

Category: Science & Technology

Hey i'm dave welcome to my shop i'm dave plumber a retired software engineer from microsoft going back to the ms dos at windows 95 days and thanks to my time as a windows developer today i'm going to explain what the crowd strike issue actually is the key difference in curdle mode and why these machines... Read more

Are we too dependent on Microsoft? | About That thumbnail
Are we too dependent on Microsoft? | About That

Category: News & Politics

Is my plane about to crash that's the thought that went through my head let me rewind when the crowd strike microsoft bug hit millions of computers worldwide a massive global technical outage caused chaos and confusion around the world crowd strike says it identified a critical problem caused by a faulty... Read more

Microsoft Outage ਦਾ ਅਸਰ ਕਾਇਮ, Airport ਤੋਂ ਲੈ ਕੇ ਬੈਂਕਿੰਗ ਸਿਸਟਮ ਪ੍ਰਭਾਵਿਤ thumbnail
Microsoft Outage ਦਾ ਅਸਰ ਕਾਇਮ, Airport ਤੋਂ ਲੈ ਕੇ ਬੈਂਕਿੰਗ ਸਿਸਟਮ ਪ੍ਰਭਾਵਿਤ

Category: News & Politics

सत श्री अकाल टीवी पंजाब देख रहे दर्शका का स्वागत मैं हा दीपिका खोसला इस वेले अहम खबर तो नाल सांझी कर रहे माइक्रोसॉफ्ट दे क्राउड स्ट्राइक अपडेट दे कारण पूरी दुनिया वि जो हड़कंप मच उसन लेके ताजा अपडेट सामने आ रही है तो दस कि एक तकनीकी गड़बड़ी दे चलते हजारा फ्लाइट रद्द करया पै गईया सी ते कई बैंका दिया सेवा तक ठप हो गईया सी थे ही जेकर ताजा अपडेट द गल करिए तो कई हवाई अड उते समस्या जो है अजे भी बरकरार है क्योंकि कई बैकलॉग अजे भी हन... Read more

Microsoft's Outage CHAOS: What Happened? Whats CrowdStrike? thumbnail
Microsoft's Outage CHAOS: What Happened? Whats CrowdStrike?

Category: Science & Technology

[music] in today's video we are diving deep into how microsoft's crowd strike update left the airline industry in cowos and what it means for you stay tuned because by the end of this video you will know all the crucial details and how it affects your travel plans before we get started make sure to... Read more

Windows Down ? - Why Microsoft Crashed Worldwide thumbnail
Windows Down ? - Why Microsoft Crashed Worldwide

Category: Science & Technology

यार आपको पता चला माइक्रोसॉफ्ट डाउन हो गया आपको पता चला कल youtube1 प डाउन हो गया था ये सब डाउन क्यों हो रहे हैं सागर भाई को छोड़ के सब डाउन हो रहे हैं सागर भाई अप एंड एक्टिव है दोस्तों आज मैं बात करने वाला हूं कि यह जो माइक्रोसॉफ्ट है ये क्यों डाउन हुआ था और अभी कल लोग य रिपोर्ट कर रहे थे कि youtube1 व्यू है उस वीडियो पे यहां पर मैंने एक छोटे से नोटिस पे एक वीडियो बना दी थी ऐसे मैं सो रहा था मैंने उठ के बता दिया कि भाई माइ डाउन... Read more

#Microsoft में दिक्कत का मिल गया सॉल्यूशन#AirlinesServer #MicrosoftServer #news #shorts #indianarmy thumbnail
#Microsoft में दिक्कत का मिल गया सॉल्यूशन#AirlinesServer #MicrosoftServer #news #shorts #indianarmy

Category: News & Politics

माइक्रोसॉफ्ट की सर्विसेस आउटेज की वजह से यूजर्स कई शिकायतें कर रहे हैं कुछ लोगों के सिस्टम खुद से बंद हो जा रहे हैं तो वहीं कई यूजर्स को ब्लू स्क्रीन नजर आ रही है भारत अमेरिका समेत कई देशों में विमानों की उड़ान पर इस आउटेज का असर पड़ा है लेकिन इसे कैसे ठीक कर सकते हैं आप अगर आप भी इस दिक्कत से प्रभावित हैं तो कंपनी ने इसके रिकवर करने के स्टेप्स को पोस्ट किया है हालांकि इससे आपको सभी सर्विसेस का एक्सेस तो नहीं मिलेगा लेकिन ठीक हो चुकी सर्विसेस को इस्तेमाल... Read more

Real men test in production… The truth about the CrowdStrike disaster thumbnail
Real men test in production… The truth about the CrowdStrike disaster

Category: Science & Technology

Last friday the world finally got the y2k experience it deserved when millions of windows machines went down thanks to a bad update from cyber security firm crowd strike 8.5 million to be exact but now the plot is thickened and multiple theories for why this actually happened have emerged a was it just... Read more

CrowdStrike Created a Major Outage, AT&T & Hackers | cybernews.com thumbnail
CrowdStrike Created a Major Outage, AT&T & Hackers | cybernews.com

Category: Science & Technology

Intro hi i am joe, that news ai you  recommended to your friend to follow.  because you did it. right? and as you’ve probably heard,   crowdstrike doomsday this day will come down in history as the  day of the great crowdstrike doomsday.  because on early friday morning or late thursday  evening depending... Read more

Microsoft Global Outage Breakdown Shutdown #viral #news #ytviral #ytvideo thumbnail
Microsoft Global Outage Breakdown Shutdown #viral #news #ytviral #ytvideo

Category: News & Politics

जय श्री राम आप लोगों का एक बार फिर से स्वागत है हमारे यू चैनल खुल के कहू में और आज हम बात करने वाले हैं माइक्रोसॉफ्ट ग्लोबल आउटेज के बारे में जैसा कि आप लोगों को पता होगा न्यूज तो आपने सुनी होगी कि 19 जुलाई 2024 को अचानक से माइक्रोसॉफ्ट की सेवाएं ठप हो गई थी और यूटीसी टाइम जोन के हिसाब से देखे तो अगर अमेरिकन टाइम जोन के हिसाब से देखे तो लगभग सुबह के 8 बजे के आसपास लोगों ने इसके बारे में रिपोर्ट करना शुरू कर दिया था और य जो माइक्रोसॉफ्ट... Read more

BREAKING: Widespread technology outage disrupts flights, banks and media outlets | LiveNOW from FOX thumbnail
BREAKING: Widespread technology outage disrupts flights, banks and media outlets | LiveNOW from FOX

Category: News & Politics

Welcome back here on live now from fox 69. over on the east coast and 319 on the west coast. my name is josh brelo, and i'm here for the next several hours to bring you all of your top stories and breaking news and i do want to get to this image right here that we... Read more

Microsoft Outage Explained: What Went Wrong❓ #shorts #microsoft  #outage #crowdstrike thumbnail
Microsoft Outage Explained: What Went Wrong❓ #shorts #microsoft #outage #crowdstrike

Category: Science & Technology

I had your computer crash in the middle of something important annoying night on 19th july 2024 the happen to thousand of people world due to an issue linked with microsoft and a major cyber security company crowd strike so what went wrong a critical update in crowd strike falcon platform had a bug... Read more

Microsoft outage cause Explained | Why it happened, What is the reason, BSOD | What is CrowdStrike thumbnail
Microsoft outage cause Explained | Why it happened, What is the reason, BSOD | What is CrowdStrike

Category: Education

The microsoft outage on friday 19th divided the world into two parts one that love the fact that it's going to be an easy long weekend and the other well let's just say it was a nightmare of an event for them and in many ways it definitely looked like a trailer to such an event so what exactly happened... Read more