Monday, September 28, 2009

Paxos

Paxos is the most efficient distributed consensus algorithm for asynchronous systems with non-Byzantine faults. There is no deterministic algorithm for always achieving consensus successfully in an asynchronous distributed system, but Paxos comes pretty close to doing so in reality for non-Byzantine faults.

The best reference on
Paxos is a paper by Butler Lampson titled "How to build a highly available system using consensus".

Paxos' creator Leslie Lamport has written two papers, "The Part-time Parliament" which is the main detailed paper, and "paxos made simple" which presents the same thing simplified. However, Lampson's paper provides a very good context for Paxos: Why is consensus needed when designing an HA service ? What makes Paxosa really good consensus algorithm for asynchronous unreliable systems ? How to use Paxos in a _real_ distributed replicated (HA) service. Google's chubby lock service is a real life application of Paxos for solving a real problem and gives interesting perspectives on where consensus fits in an HA service implementation.

For instance, Lampson shows how an HA service need not use consensus for each and every consensus decision. Instead, it can use consensus once to elect a lease-based primary or grant a lock which can then be used. Chubby also uses
Paxos to provide a lock service rather than let clients use Paxos for every consensus decision. This helps performance since the lock owner or primary can pretty much do what it wants without having to consult other hosts. Another real world benefit is when modifying existing non-HA code to be HA. Its easier to modify existing code to convert it to a distributed system using locks, than make it use consensus for every majority decision.

Lampson's paper and the paper on chubby lock service claim that
Paxos consensus algorithm and the consensus algorithm used in Viewstamped replication which was published around the same time are equivalent. Viewstamped replication was used by Ori and Liskov to build a replication file system where the primary election happens through consensus. "Viewstamping" is used to make sure that even after a new primary is elected (and view changed), events pending with the older primary are well known to predate those seen by the new primary. The concept is similar to "round number" in Paxos where each view is like a round number and viewstamps are monotonically increasing like Paxos round numbers.

There was an MIT project by Barbara Liskov in 1990 called Argus which tried to create a programming model for writing applications which would then be automatically made highly available (HA) using viewstamped replication based system. Argus staff (Sanjay Ghemawat) joined Google, so mapreduce is probably inspired to some degree by Argus. Mapreduce does not have Argus style fault tolerance, it has basic fault tolerance implemented by the master.

Safety property of Consensus:

All non-failed processes agree on a value v that is proposed by at least one process (v must be proposed by some process, otherwise the system can e.g., trivially agree on value 0 all the time).

The basic idea behind Paxos:

Consensus is easy if there are no faults. For example, two phase commit: one process decides the outcome and asks all processes to prepare. If all processes agree, it issues a commit and job is done. If leader fails, the outcome is not clear. Another example: majority-based consensus: there is no leader. All processes choose a value and broadcast. If a majority agrees on the same value, there is consensus. But if there is no majority, or if some processes crash so that nobody knows if consensus was reached, this algorithm fails.

Paxox's basic idea comes from this non-fault-tolerant majority consensus algorithm. To make it fault-tolerant,
Paxos repeats majority consensus until there is consensus. However, the challenge comes from the asynchronous nature of the problem due to which its difficult to know when the current voting round has failed to reach consensus and hence a new round needs to be started. In practice, some process will start a new round after waiting for a certain time if consensus has not reached till then (Paxos is protected from multiple processes starting a round since all round numbers are ordered, so one of them will take precedence over the other thus effectively killing that other round). But the decision about the perceived failure of the previous round can be false as its made by some process based on messages it receives from other processes (after consensus is reached, typically the result is broadcasted for everyone to know though this is not needed for correctness). So if some messages are lost, its decision can be incorrect. Thus there could be a processe A which has witnessed that there has been a consensus on some value but process B doesn't know it. So B start s another round possibly using a different value. So if B's new request to A is lost, A will wrongly stick to its older consensus value while the new consensus could be on a different value. So for Paxos to work, each new round should use a value for which there has possibly been a consensus in a previous round. The tricky part of Paxos is to ensure this property.

Paxos ensures this by having the leader process of a round ask other processes for their past values and decide for which value there could possibly have been a consensus. Two key details that make the algorithm work are:
  1. Each process can only be participating in one round at a time.
  2. Round numbers are monotonically increasing so knowing which round is "previous" is trivial.

References:
  • Butler W. Lampson, How to Build a Highly Available System Using Consensus. 10th International Workshop on Distributed Algorithms (WDAG 1996)
  • Leslie Lamport, The Part-time Parliament, ACM Transactions on Computer Systems, May 1998
  • Leslie Lamport, Paxos made simple,http://research.microsoft.com/users/lamport/pubs/paxos-simple.pdf

Future Reading:
  • Jean-Philippe Martin, Lorenzo Alvisi, Fast Byzantine Paxos,http://www.cs.utexas.edu/users/lorenzo/papers/fab.pdf

6 comments:

Anonymous said...

Hello !.
You may , perhaps curious to know how one can manage to receive high yields .
There is no need to invest much at first. You may begin to get income with as small sum of money as 20-100 dollars.

AimTrust is what you need
The company incorporates an offshore structure with advanced asset management technologies in production and delivery of pipes for oil and gas.

Its head office is in Panama with structures around the world.
Do you want to become an affluent person?
That`s your choice That`s what you wish in the long run!

I feel good, I began to take up income with the help of this company,
and I invite you to do the same. If it gets down to select a proper companion utilizes your savings in a right way - that`s the AimTrust!.
I make 2G daily, and my first deposit was 1 grand only!
It`s easy to get involved , just click this link http://haliwosuc.freecities.com/katihi.html
and go! Let`s take our chance together to feel the smell of real money

Anonymous said...

Hi!
You may probably be very curious to know how one can make real money on investments.
There is no initial capital needed.
You may begin to get income with a sum that usually goes
on daily food, that's 20-100 dollars.
I have been participating in one company's work for several years,
and I'm ready to share my secrets at my blog.

Please visit blog and send me private message to get the info.

P.S. I make 1000-2000 per daily now.

http://theinvestblog.com [url=http://theinvestblog.com]Online Investment Blog[/url]

Anonymous said...

I really like when people are expressing their opinion and thought. So I like the way you are writing

Anonymous said...

One of my friends already told me about this place and I do not regret that I found this article.

Anonymous said...

It isn't hard at all to start making money online in the undercover world of [URL=http://www.www.blackhatmoneymaker.com]blackhat seo[/URL], Don’t feel silly if you don't know what blackhat is. Blackhat marketing uses alternative or misunderstood ways to produce an income online.

Anonymous said...

Message42, http://www.arlo.net/massacree/ viagra online in uk, qemo5, http://www.arlo.net/fccgb/ order viagra no prescriptions, hdqk2, http://www.arlo.net/fccgb/notes/ buy generic viagra, pslb5, http://www.arlo.net/bytes/ order cheap viagra, obkw1, http://www.arlo.net/live/ viagra sale online