Transcript of Goroutines: Under the Hood | Vicki Niu | Go Systems Conf SF 2020

hi everyone um my name is vicky and i'm really really excited to be with you here today at the ghost systems conference and to be talking about go routines and really going under the hood you know i feel like during this pandemic some people have been making sourdough or starting book clubs and i have been watching reality tv and reading the go runtime source so i'm very excited to have some fruits from that to share with you all today um so you know we're all go programmers here and i think one thing that unites us all is that we really love go's concurrency primitives you know speaking just for myself i know one of the things that really draws engineers and engineering teams to the go programming language is that there's really great abstractions and native support for go routines channels things that make you know writing really performant concurrent programs efficient and easy and of course if you are also like me one of the real beauties of that is that go exposes these great interfaces for us to use so we really usually don't have to worry about what is happening under the hood we can spawn go routines and use channels and know that we're creating lightweight efficient processes that are able to communicate with each other and we normally don't worry so much about how that happens because everything runs pretty and span but of course every now and then you might get to wondering what is a good routine what is really happening when you call go funk and kick off this process and when i ask this question to myself the answer that immediately popped into my mind that i'd heard a lot before is that go routines are lightweight threads and i think this can be a really useful way to think about go routines from the perspective of a programmer and a user but today we're going to delve a little bit deeper to unpack what that really might mean and what go routines really are so one thing about this go routines as lightweight threads framings is that it sort of makes us think about the lightweight go routine versus the operating system thread and while this is a comparison that makes sense in some ways because other languages will directly use operating system threads as modes of concurrency really what when we look at what go routines are and how they're implemented we see that go routines leverage operating system threads and the way that i've come to think about it is that the go programming language and the go routine is providing a really great sort of interfaces with our operating system to be able to leverage things like os threads but also provide an even more lightweight and powerful concurrency primitive so before we delve too deep into any of the details we'll run through some fast facts about go routines just to get us started one thing to note is that go routines are started and managed by the go runtime so another thing that you'll often hear is that go routines are like user space threads because the go runtime manages them we don't actually interface directly with the operating system which is really nice because it means also that go routines can be hardware independent since we can have different implementations of go routines on different architectures but expose the same interface to the programmer meaning that our programs can run just as concurrently and easily on a variety of different machines the other thing that's really great about go routines is that they're really lightweight they have really low overhead go routines start off with just two kilobytes of memory but they have growable stacks which allow them to grow and shrink according to the function or the go routines needs the other thing that's really great is that we have really seamless and cheap context switching thanks to the go scheduler and what the go scheduler really works to do is to maximize efficiency we want to be sure that as much as possible we're using all of our system resources our system operating system threads to be actively executing go routines and scheduling off and on and go routines that are in a blocked state the other thing that we really love about go routines which is sort of the powered duo is that we have native inter-go routine communication with channels in a way that's memory safe and also works really well with the scheduler to allow us to block and unblock go routines on channel sends and receives so in general all of this makes it so that it's trivial usually to run even millions of go routines on a given machine usually we don't even have to worry about the details as go programmers and we can just go forth and um spin off our go routines and be on our merry way but of course you're thinking vicki we're now like five minutes in and you still haven't told us what a go routine is so now we'll get to the fun part where we get to dive in a little bit to the go run time so within the go runtime source code a go routine is usually referred to by this type g and we can see here just an abridged version of what the g struct really is and there's not so much to it a go routine has a stack a go routine also keeps track of its m or the operating system thread that it is currently running on we also have a buffer where we can save some state for the go routine for when we need to schedule it off or back on to an os thread so we can resume execution and then we also have some notion of status so a go routine can be ready to run it can be actively running or it can be waiting or blocked we also have a few fields that help us to keep track of the ways that channels interact with our go routines namely if there are channels that are pointing into our go routine stack um or if we're about to park or wait on a channel send or receive and so we want to be sure that we don't do any unsafe stack shrinking so with that you're probably wondering a little bit about how the go routine stack growing and shrinking really works and what this notion of stack copying is so to talk about the go routine stack like we mentioned before all go routines are initially allocated 2 kilobytes of memory this number has changed a little bit but is generally supposed to allow most functions to run without any need to grow the stack while also remaining pretty lightweight and each go function actually is given a small preamble and what that preamble does is it calls um into the runtime it sees if there's enough memory to execute and if we're out of memory it calls this command more stack what more stack does is it goes on to the heap it allocates a new memory segment double the size of what the go routine currently has and then it proceeds to copy over the entire go routine stack onto this new doubled memory segment and free the old memory then we restart execution and hopefully we have enough memory but if not of course we resume another call to more stack which doubles the memory again and copies so effectively what this means is that our go routines are infinitely growable however much memory is needed will just be um first allocated and then copied over with this allocate and then copy strategy this also means that stack shrinking is really efficient because once our go routine releases the resources that it no longer needs those are just freed back onto the heap for the system to use as necessary so now we can understand a little bit better what's happening here in our go routine struct and we see that it's not really so complicated you know our go routines are mainly here to keep a notion of the stack that we need to run and to know what thread it's running on and a little bit of other state so then we might think what is the real magic of go routines and a lot of it is about when a go routine runs we know that our systems have you know fixed resources but we're able to spawn far more go routines so a really key part of maintaining efficiency and performance in our go programs is the go scheduler so within the go scheduler and in the go runtime there are three key players or key concepts we have g who we've already seen which is our go routine we also have m alluded to a little bit which is the operating system thread and then we also have this new player p which is a logical processor so one way we can think about these three players is that the go routine contains the code to execute the operating system thread is where we execute it and the logical processor or p gives us the rights and resources to execute that code on the os thread another important facet of the scheduler is that is what's called an m of n scheduler this means that we have m number of go routines that we're attempting to schedule onto n operating system threads so the scheduler is responsible for essentially multiplexing these go routines onto threads which means also that we can you know have an architecture that is independent of the number of threads that we have available and we're free to basically move around go routines onto these various resources and the way that the runtime thinks is most efficient so to go back to our key player here g the go routine as we mentioned before go routines can usually be thought of as being in one of three states a go routine can be blocked a good routine can also be runnable or ready to execute and then a go routine can be actively running so given these three states we can also think about the scheduler's job as trying to run go routines as efficiently as possible so where we can we want the go routines on our n operating system threads to always be in the third state of actively executing so now we have our three players and we can walk through a couple examples of what happens on the scheduler when we're creating and running go routines so in the most basic case we are just running our function main when we start up our go program and main will run on the main execution thread so here you can see that we have an operating system thread we have a logical processor p and then we have a go routine g which is running our main function then what happens is that we'll actually create all the processors that we need and the number of processors created by the runtime is going to be equal to the number of logical cores on your machine which you can see in the go environment as this variable go maxprox so now let's say that main spawns a go routine what then happens is that that go routine will go and wake up an idle processor this processor will then create an operating system thread for it to run go routines and then schedule that go routine onto the thread and begin to execute now let's say that our go routine is finished executing so now our thread and our processor are idle so they go back into holding to wait to be used to schedule and run for their go routines so with this example you might be wondering what we really need this processor p for it seems like we're just mapping go routines onto threads and there's no real need for this other player but we really need p to keep track of the notion of what go routines to run in the case where there's more work to be done than there are resources on which to do it so let's say that this is the state of our world we have two operating system threads two processors and two go routines happily executing now our main function spawns a new go routine and we need to find a place for it to live what happens is that we cue this go routine onto the local processor and add it to what's called the local run queue so what one of the really important functions of these processors play is that they maintain an idea of what go routines have yet to be run in their local run queues then when g1 finishes executing and the processor is available again it looks in its local run queue for a g to run and then schedules that onto the os thread to begin executing so p's are really important because they know what go routines are available for an operating system thread to run so now we've talked a little bit about how we schedule run and go routines that are all ready to go but a really important part of the scheduler is dealing with blocked go routines and importantly since we want to maximize execution time on operating system threads one of the scheduler's really important jobs is to move blocked go routines off of operating system thread resources and then bring them back on once they're ready so we're going to walk through three of the main cases in which this happens um so there are many things that can block go routines but some of the most common things are system calls network i o calls and channel operations so to first talk about blocking system calls let's say that this is our world we have one operating system thread that is happily executing its go routine g and this go routine g then makes a blocking call a blocking system call which invokes the runtime scheduler now g is in a blocked state and what happens is that our processor p0 releases the operating system thread that is executing g's blocking system call so then p0 needs some os thread to continue executing these go routines that have just been added to its run queue and so it goes ahead and wakes up this thread m1 and continues execution now when g0 unblocks m0 our operating system thread needs to find a processor to continue running it wakes up this processor p and then continues to execute now in the case where our thread can't find a p there are no processors available then the thread goes to sleep and our go routine is added to what's called the grq or the global run queue where processors can look if their local run cues don't have any go routines so what's really neat in the way that the scheduler handles blocking system calls is that it allows the block and go routine to hang on to the operating system thread that's executing the system call but we also swap it off of the processor so that resource is available to run other go routines and of course you know you might be thinking that this is a lot of work to be doing this switching on and off while it's much more efficient than context switching os threads is still not no overhead and since some system calls are quick the go runtime actually does a slight optimization where it only does this context switching for system calls that it thinks will be quite expensive so now we can think about how the go runtime might handle network calls of course network calls are a kind of system call but importantly they are asynchronous and if we think about how we commonly handle um network i o as go programmers usually the net package provides a default of spawning a go routine to handle each incoming connection and within go we can often interact with these network requests in a in a blocking interface which makes our lives as go developers really easy and the way that this blocking interface is exposed is thanks to the work of what's called the net polar and the net pooler's job is to schedule go routines that are waiting on these asynchronous system calls which provides that blocking interface um in go so to talk a little bit more about what that means let's say that we have our one happy operating system thread running a go routine with a couple more scheduled now g0 wants to make a network call and of course it goes into a blocked state now since we were waiting on that network i o now we introduce the net polar and what the netpooler does is it has its own operating system thread and it handles these events from go routines that want to do network i o the network interfaces with the operating system that you're running on to pull the appropriate network sockets and then it reschedules go routines when their network resources are available or when they're unblocked so to dive in a little bit to how this works netpool is essentially an interface that is implemented in various different architectures that go runs on and there are a couple of different functions in the interface but the most important one for our purposes is this netpoll function and what it does is it pulls the network and it returns a list of go routines that are now ready for execution then what happens in other parts of the runtime is that we regularly call this netpoll function and inject the list of go routines that are ready back onto resources that are about to be scheduled on our processors so g0 is moved to the net polar when it wants to make a network call where the net polar will then begin to regularly pull the file descriptor for the network resource that g0 is trying to access and then this frees up our processor and os thread to then schedule g1 and begin to execute now later when the netpoller adds g0 to our list of executable go routines when the network call is complete the g0 go routine gets moved back onto the local run queue where it can then be scheduled for execution on our os thread so this is really neat we don't need to spawn new operating system threads or new processors to handle these network calls and this also exposes a really nice interface for us as programmers where we can think of network io as blocking within our go routines but the net polar handles the asynchronous part of that exposing this really nice interface so now we'll talk a little bit about blocking on channels one of our other favorite concurrency primitives and channels are great because since they are built also in go channels have really good insight into what's happening in a go routine which allows them to schedule go routines really efficiently so again here is our world we have an operating system thread running one of our go routines with a couple in the queue and now g0 wants to do a send on a full channel and this is blocking since our channel is currently full within the channel struct we have this notion of a receive queue and what the received queue is is it's actually just a list of go routines that are blocking and waiting on a channel received to happen so in this case we add g0 to the channel's receive queue and the scheduler because g0 is blocked removes it from m0 and begins to execute the next go routine now let's say that this go routine g1 is our savior it's doing a receive on this channel and so when that channel receive happens the channel looks into its received queue sees g zero then we can make g zero runnable and add it back to our local run queue so all of this happens really seamlessly by leveraging the channel struct and calling into the scheduler when channel resources become available and unblock our go routines so we walked through some examples of how we schedule different work and we'll add on sort of one bonus feature of the scheduler which is how it implements work stealing between threads so we might be in a case where we have two threads that are both have somewhat healthy cues and are executing and we might get to a case where one of our threads finished executing all of its go routines faster than another but there's still work left to be done and in this case of course we want to redistribute work so that all of our resources are constantly executing so what happens here is that p0 tries to steal work from another p so we only have one other processor so p0 goes ahead looks in its local run queue and steals half of the work for itself and then begins to execute this ensures that threads don't remain idle while there's still work to do and effectively redistributes load between our processors and our operating system threads so now we are go routine experts a little bit we know that the go routine stack is lightweight and growable we know that the runtime scheduler uses m of n scheduling to allow for really efficient use of our operating system threads and we know a little bit more detail how we handle long operating system calls networking sys calls and channel blocking all really seamlessly in the scheduler and hopefully we've come away with a little more understanding of the sort of convenient and really nice blocking interfaces that are exposed for us to use as go developers so i'm so glad for you all for sticking with me through this adventure into the go runtime and understanding how go routines work hopefully you've come away with a little more knowledge i know i've come away with a lot more appreciation and thank you all so much for your time

Goroutines: Under the Hood | Vicki Niu | Go Systems Conf SF 2020

Share your thoughts

Related Transcripts

Xcel Energy (XEL) FAST Graphs Stock Analysis #shorts

XRP to $2?! - Ripple XRP Price Prediction

Oracle Stock Jumps 9% in After-Hour Trading As AI Powers Growth, Tech Giant Partners Up With Amazon