A look into Go Scheduler’s Design

Go is a relatively recent language designed with concurrency first in mind (from introduction to Go byRob Pike). Its standard library is state of the art code written by coders who worked on Unix. My focus in my grad studies, from general to specific was Distributed Computing -> Concurrent Algorithms -> Shared Data Structures. I also created a novel wait-free queue. So I’m interested in checking how concurrency is done in Go. In this post I want to talk about the Go Scheduler; I will discuss Go Channels in another post. My main focus is to review their design document with an academic point of view.

You can find Go’s Scheduler Design Proposal here. The author has presented their work in these slides (video). These are two good articles that explain the main ideas in a simple way (1,2). I’m not going to repeat them here but rather discuss the aspects I find interesting in the design.

The first thing that interested me was the absence of safety (ex. linearizability) or progress (ex. wait-freedom) requirements in the design specifications. The primary concern is performance, specifically handling 1 million concurrent goroutines. I believe this implies a desire for the API to be minimal and effectively manage I/O-bound routines. Somehow to me, it looks more like properties than requirements.

The author initially implements a lock on the global queue in their M:N Scheduler (Slide 18) to manage concurrent queue operations, which seems reasonable. However, later (Slide 33), they discuss the scalability¹ issues arising from having a lock on the global queue due to high contention on its lock, prompting the need for an alternative solution. On Slide 35, they assert that lock-free queues are not a suitable solution in this scenario. Vyukov then employs an analogy to illustrate their rationale: using locks is akin to scheduling shifts for developers to use a single computer, whereas lock-free appears as though everyone can access the computer simultaneously. They propose that the optimal solution is to provide a computer for each developer, relating it back to the original problem. This argument employs a straw man fallacy. I completely disagree with this reasoning. If they demonstrated empirically that using locks yields better performance than lock-free algorithms, I could only concede to their claim. However, they do not explicitly present such evidence to the audience. Instead, they attempt to mitigate contention by creating a local queue for each goroutine (more on fine-grained locks).

In slide 46, he proposes an idea of work stealing. This is similar to the well-known helping mechanism in concurrent algorithms literature. I would say it is close to the way Herlihy constructs his Universal Construction.

He proposes to use the number 61 to decide whether to run from the global queue or the local queue, vaguely claiming that this decision is superior to using 64 because 61, being a prime number, breaks repetition patterns. I don’t quite understand how he supports this claim.

In slide 60, he notes that the scheduler should be fair, a requirement not initially specified in the design. However, his definition of fairness aligns informally with lock-freedom, one of the weakest and most renowned progress properties in the literature. I’m not fond of the idea of renaming a well-known property that has been extensively studied. Nonetheless, since this is not a technical paper, it seems acceptable. Furthermore, he does not demonstrate that the scheduler is lock-free, which it is not. From my understanding, his aim is simply to prevent some goroutines from starving.

In conclusion, despite the shortcomings in their documentation (perhaps I did not find their best document), Go effectively handles numerous concurrent goroutines, which is impressive. It performs admirably in practice, yet there remains a significant disparity between the ideal algorithm as conceived in my mind and Go’s current implementation.

It’s not clear what they mean by “scalable,” I believe they refer to worst-case performance as the number of processes increases. ↩︎