Fischer-Lynch-Paterson impossibility

Failures

So far, so good—we have a distributed consensus algorithm that works in the absence of failure (the by-name algorithm). But failure is very tricky, and, perhaps shockingly, even a single stop failure makes the consensus problem impossible to solve. This is the famous Fischer-Lynch-Paterson (“FLP”) impossibility result (link).

Let’s discuss the proof. It’s interesting because the result is shocking, and because the proof highlights some of the weird kinds of operations and transformations that practical distributed systems designers must consider.

(By stop failure, we mean a server simply stops sending messages, like it crashed.)

Solid and striped configurations

The impossibility proof starts with any consensus algorithm that works in the absence of failure. We know there is such an algorithm; we wrote one. And any algorithm that works works for at most one failure must also work when there are none!

We then show that, for any such algorithm, there exists a state where, if a specific server were to fail, the rest of the system could not reliably reach consensus—even all the other servers working together cannot tell which decision is allowed.

Consider $$\mathscr{T}$$, the set of all valid traces and trace prefixes of the failure-free algorithm. $$\mathscr{T}$$ includes all possible initial states. Divide $$\mathscr{T}$$ into three classes:

$$\RD{\mathscr{R}}$$, the solid red traces. Solid red traces can only be extended to red decisions: if $$T \in \RD{\mathscr{R}}$$ and $$T \cdot \sigma \in \mathscr{T}$$, then any message to Nancy in $$T \cdot \sigma$$ is red. Red traces have unambiguously decided for the red answer, although no server in a particular red trace may have realized that yet.
$$\BU{\mathscr{B}}$$, the solid blue traces. Solid blue traces can only be extended to blue decisions.
$$\PU{\mathscr{S}}$$, the striped traces. $$\PU{\mathscr{S}} = \mathscr{T} - \RD{\mathscr{R}} - \BU{\mathscr{B}}$$; it includes any trace that can be extended to either a red or a blue decision.

Do these trace classes all exist? We know $$\RD{\mathscr{R}}$$ is non-empty; any trace that initializes all servers to red must eventually decide red, and so must be in $$\RD{\mathscr{R}}$$. Furthermore, consensus requires that all servers decide the same color; so if $$T$$ contains any red message to Nancy, then any extension of $$T$$ is in $$\RD{\mathscr{R}}$$. Similarly, $$\BU{\mathscr{B}}$$ must be non-empty because the all-blue initialization must eventually decide blue. Does any interesting striped trace exist? Absolutely: the empty trace $$[\,]$$, which can be extended to either the all-red initialization or to the all-blue initialization.

The solid–striped boundary

Consider a trace $$T \in \mathscr{T}$$ where:

$$T$$ is striped.
Every extension $$T \cdot \textit{op}$$ is solid (either solid red or solid blue).

Some $$T$$ must exist because our algorithm terminates: every non-failing server must eventually decide. (If every striped trace had at least one striped extension, then the algorithm could avoid terminating by always choosing the striped extension.) And since $$T$$ is striped, it must have at least one extension of each color. So say $$T \cdot \RD{\textit{op}_R}$$ is solid red and $$T \cdot \BU{\textit{op}_B}$$ is solid blue.

Interestingly, $$\RD{\textit{op}_R}$$ and $$\BU{\textit{op}_B}$$ must be operations on the same server! If they were on different servers, then the reordering power afforded by the asynchronous communication model and the deterministic computation model would let us obtain a contradiction. To see why, consider $$T \cdot \RD{\textit{op}_R} \cdot \BU{\textit{op}_B}$$, and assume $$\RD{\textit{op}_R}$$ and $$\BU{\textit{op}_B}$$ are on different servers. $$\BU{\textit{op}_B}$$ cannot be receiving the message sent by $$\RD{\textit{op}_R}$$, because we know that $$T \cdot \BU{\textit{op}_B}$$ is valid. Thus, we can reorder the trace, obtaining $$T \cdot \BU{\textit{op}_B} \cdot \RD{\textit{op}_R}$$, to get a new trace that is both valid and indistinguishable from the original. And there’s the contradiction: the reordered trace must be both solid red (it is indistinguishable from $$T \cdot \RD{\textit{op}_R} \cdot \BU{\textit{op}_B}$$) and solid blue (it is an extension of $$T \cdot \BU{\textit{op}_B}$$).

How can a single trace be extended with different valid operations on the same deterministic server? Simple: they must be RECV operations that either fail differently or receive different messages.

A single failure makes progress impossible

We’ve now found a valid failure-free trace where the message received by a single “deciding” server determines whether the future execution is solid red or solid blue.

What happens if, immediately after receiving that message, the deciding server fails?

The algorithm is doomed. In the asynchronous computation model, other servers cannot distinguish between failure and arbitrary network delay. We, the algorithm analysts, know that the network will eventually deliver any message that was sent, but we don’t know when; so as algorithm developers, we can’t assume that failure has happened, even after 10,000,000 receive attempts. There is no way to reliably terminate the execution in a valid decision. If the algorithm continues and a non-deciding server decides red, there’s a chance that the deciding server has received the message that turned the execution solid blue, making the red decision erroneous. To avoid a bad decision, the algorithm can just not decide; but this also violates the problem statement.

This doesn’t prove that an algorithm can never cope with a single failure in this computational model, but it does prove that no algorithm can cope with every possible single failure in this computational model. To solve the consensus problem, we need to change the model!

References

Impossibility of Distributed Consensus with One Faulty Process. Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. JACM 32(2), April 1985.

A Hundred Impossibility Proofs for Distributed Computing. Nancy A. Lynch. Proc. PODC 1989.