This article is about how we keep time in our computers. First, let’s talk about the physical clocks in the world. There is no completely accurate physical clock; one clock runs slightly faster, and another clock runs slightly slower. This error can become quite significant over time. Even a clock with a 1 ppm skew (1 part per million) is going to have ~86ms drift per day and ~32s error per year. Most computer clocks are correct within 50 ppm. Therefore, we can’t entirely rely on computer clocks. Fortunately, there are other types of clocks known as atomic clocks that are precise enough for the time that humans live on planet Earth. The only drawback is that they are expensive.
One solution is to sync our computer clocks with precise global atomic clocks. This can be done using GPS satellites. Each satellite carries an atomic clock and broadcasts the current time and location. Receivers can compute their location and current time by measuring the speed of light delay between satellites and the receiver. Data centers usually have accurate antennas on their roofs to perform this time calculation. If we have some servers in sync with atomic clocks around the world, then we can then synchronize other computers with those servers. A protocol named Network Time Protocol (NTP) id for this clock synchronization over network, which is widely used by most operating systems. You can view and configure your NTP server in your OS settings. Your computer frequently queries the NTP server for the time and updates its local clock with the NTP server’s response. Statistical methods come into play to reduce random errors. If the clock skew is small, your computer slightly adjusts its clock speed. If the skew is larger, the computer will reset its time to match the estimated server time. Note that clock skew can be negative or positive, which means clock can jump forward or backward after an update. In the past, older operating systems used to panic when the estimated clock skew exceeded a certain threshold.
The use of time in distributed systems is to determine the order in which operations occurred and to reason about that order. Consider a distributed system with three nodes that send messages without containing timestamps. In the execution below, we can observe that Node C sees message m2 sooner than m1, even though m1 actually occurred before m2. How can we ensure that Node C understands that m1 occurred before m2? One solution is to include creation timestamps for both m1 and m2. However, a problem arises: what if the difference between T(m2) and T(m1) is smaller than the potential clock errors? To address this inconsistency, we turn to Logical Clocks instead of Physical Clocks. Logical clocks can help us determine the temporal order of operations.
Ordering of events (sending/receiving messages, local execution steps) is defined as:
We say event a has happened before event b (a → b) iff:
- a and b occurred in the same node and a occurred before b in the node’s local execution order; or
- a is the sending of some message m, and b is the receipt of that same message m(assuming sent messages are unique); or
- there exists an event c such that a → c and c → b.
“Happens before” is a partial order — it is possible that neither a → b nor b → a In that case, an and b are concurrent (written a || b). When a → b, a might have caused b; when a || b: a cannot have caused b. We will look at two types of logical clocks, Lamport Clocks and Vector Clocks.
Lamport clocks are very basic. Each node has a counter that shows the time. Whenever an event happens The node increases the counter. The node also sends its local time with messages and update its time with the messages it receives.
Lamport Clock
on initialization:
t := 0
on any event occurring at local node:
t := t + 1
on request to send message m:
t := t + 1; send (t, m)
on receipt of (t', m):
t := max(t, t') + 1
(pass m to the application)
If a → b then L(a) → L(b) (why?), but if L(a) → L(b) it does not result a → b (also why?). Furthermore it is possible that L(a) = L(b) for different events an and b. However we can resolve that by giving an ordering to the nodes. If L of two operations in two different nodes are equal then the operation in the node with lower order has occurred sooner.
We can generalize the ideas of lamppost clocks o vector clocks. Each node instead of storing its own time, stores the time for all the other nodes and sends all of the times in each message. The idea is very similar to shared snapshot objects.
However, the logical clocks in that paper were scalars, not vectors. The generalization to vector time was developed several times, apparently independently, by different authors in the early 1980’s
from wikipedia page of vector clocks
Vector clock
on initializing Ni:
T := [0, 0, . . . , 0]
on any event at Ni:
T[i] := T[i] + 1
on request to send message m at Ni:
T[i] := T[i] + 1; send (T, m)
on receiving (T', m) at Ni:
T[j] := max(T[j], T'[j]) for every j in {1..n}
T[i] := T[i] + 1
(pass m to the application)
Vector clocks define the following order on vector timestamps:
- T = T’ ⇔ T[i] = T'[i] for all i ∈ {1, . . . , n}
- T ≤ T’ ⇔ T[i] ≤ T’ [i] for all i ∈ {1, . . . , n}
- T < T’ ⇔ T ≤ T’ and T ≠ T’
- T || T’ ⇔ not ( T ≤ T’ or T’ ≤ T )
This gives us a partial order that is consistent with causaility:
- (V (a) < V (b)) ⇔ (a → b)
- (V (a) = V (b)) ⇔ (a = b)
- (V (a) || V (b)) ⇔ (a || b)
We have come a long way to compute the time. Finally we can use vector clocks to reason about causality of events.
Read more:
Chapter 8 of Martin Kleppmann’s ‘Designing Data-Intensive Applications’
The Trouble with Distributed Systems
Examples are from CS1380: Distributed Systems course from Brown University.