SHE256 2018 : Elaine Shi – Thundertoken : A Fast and High-Throughput Cryptocurrency

– [Announcer] I am so excited to welcome Elaine
Shi from Cornell University, and the co-founder of IC3. She is the co-inventor of Thunder’s core consensus
protocol, and co-author of the first peer reviewed publication on Bitcoin, and the first
peer reviewed publication on smart contracts. Professor Shi will be talking about Thunder. – [Prof. Shi] Thank you so much for the introduction! I’m very happy to be here, and today I’m going
to be talking about Thunder Token, a very fast, and high-throughput blockchain that
our team is building. You can think of it as a much faster version
of Ethereum, we are going to be EVM compatible. Okay, so 2017 is said to be the year of the
ICO, and 2018 is the year of the dApp, right? There are many exciting dApps. There’s games on blockchains like CryptoKitty,
there’s sharing economy like putting Uber and Airbnb on top of blockchains, there are
many others, like prediction markets, decentralized exchanges, social networks, so on so forth. So all of this is very exciting, but there
is, a truth that everyone knows, but no one likes admitting, that is, you know, none of
these apps will actually take off to a large scale on top of today’s blockchains. They’re just too slow for the purpose of all
these dApps. For instance this chart shows Ethereum transaction
fees over time. If you look at it carefully, in December last
year, December last year was a particularly exciting time for our community, partly because
CryptoKitties also came about in December, right? And, you know, if you compare CryptoKitties
with traditional games, like social network games, mobile games, it’s not like they have
a lot of volume, but this single game alone was sufficient to overload the entire Ethereum. So back in December, Ethereum transaction
fees were like one million USD per day, and Ethereum was like, 90% utilized back then. So this is a big problem. Okay. Now, scalability as I said, I believe is the
most exciting, and a very challenging problem for our community today. In our company Thunder Token, we believe that
we have the best technology to overcome this challenge. And even with a single chart, we can achieve
100 more x, 100x more throughput than today’s blockchains, and we can confirm transactions
in less than a second. So now, in today’s talk, I’m going to focus
on describing the core consensus protocol, that enables such a fast cryptocurrency, it’s
called Thunderella. This is joint work with Rafael Pass. Of course it takes a lot more than just the
consensus protocol to build a new cryptocurrency, like for instance, we have to have the right
incentive mechanisms, we have to have the right government policy, but in the interest
of time, today I’m only going to focus on the consensus protocol. Okay, now if you guys have any questions,
please fell free to interrupt me. So to explain consensus we first have to understand
the, what problem we are trying to solve. So the problem we are trying to solve here,
is called state-machine replication, it’s also called a linearly ordered log, or consensus. In the rest of the the talk all these terms
roughly mean the same thing. Okay, so what is state-machine replication? Imagine we have a set of servers, in this
case we have Google Wallet servers. Now the job of Google Wallet, of course, is
to establish a linearly ordered log of transactions, and the transaction log will grow over time. And there are two important properties that
we care about, namely consistency and liveness. Consistency says that all the honest nodes
must agree on the log. Now the network may have delay, right? So it could be that your log is a little faster
than mine, that’s okay, but it must be the case that your log is a prefix of mine, or
mine of yours. So now the second property is liveness. Liveness says that whenever an honest client
submits a transaction, very soon the transaction will appear in every honest servers log, right? It’s not like we want forever for our coffee. So at first sight, this seems like a very
boring problem. What is so difficult about building a linearly
ordered log? Indeed, if all the servers are honest, and
they correctly follow the protocol, this indeed would have been a trivial problem, but what’s
non-trivial about it, is when some of these servers can be corrupt. For example, some of these servers can have
malware, and the malware can be exploited by the adversary, and these corrupt servers
can deviate arbitrarily from the honest protocol. And even in these situations we want to make
sure that the remaining set of honest servers, still satisfy these security properties. So that’s why this is a very challenging problem. As it turns out state-machine replication
is not a new problem, it has been around for 30 years, and the distributed systems community,
like, have been working on this for a long time, and in fact, if you look at all these
companies in the Silicon Valley, everyone actually deploys some state-machine replication
protocol, to replicate their computing infrastructure. So for instance, Apache Zookeeper, is an opensource
implementation of Paxos, right? Paxos is a state-machine replication protocol,
and many of these companies deploy Apache Zookeeper. Okay. Traditionally, when we talk about consensus,
the kind of scenario that we have in our mind, is exactly what I talked about. There’s a single company like Google or Facebook,
and they have, let’s say, a dozen nodes, and these nodes are connected with fast local
area network. Okay. So like I said, consensus is not a new problem,
but what is new here is the large-scale. With cryptocurrency, what is amazing, is that
we are now having consensus on the internet-scale, involving thousands to millions of nodes. And as it turns out, consensus on a large-scale,
is very much a different beast. It is much, much more challenging than the
classical setting and this perhaps explains why all these ICOs are there. Everyone’s, you know, rushing to create and
invent their own consensus protocol. But at the end of the day this doesn’t make
any sense, right just like crypto, consensus is very tricky as well. And we know that, you know, you shouldn’t
be rolling your own crypto implementation. So why is is the case that everyone feels
the need to, like, invent or roll their own consensus implementation? Okay. In the rest of the talk, I’m going to try
to answer these to questions: number one, like I said, why, even after 30 years of research
and development, there still isn’t a dream large-scale consensus protocol? And number two, I’m going to explain what
I believe to be the dream large-scale consensus. Okay, so let’s begin with number one. To understand why there still isn’t a dream
protocol, we need to look at the consensus landscape. So roughly speaking there are two broad classes
of approaches, and I’m giong to call them the classical approach, and the blockchain
approach. So classical approach, typical examples are
like PBFT and Paxos. These are the kinds of protocols that Google
and Facebook deploy. And so what do we know about classical consensus? When the network has sufficient bandwidth,
these protocols are very fast, because they confirm transactions, in a normal case, in
constant number of round trips. On the other hand, these protocols are notoriously
complex, and this makes these protocols very difficult to implement, to do security audit
for, and to maintain. And also, you may have heard that these protocols,
they don’t scale very well to a large number of nodes. Okay, so then there is blockchains. Blockchain is, you know, the more sexy approach,
right? And the first blockchain, of course, is Nakamoto,
which is the protocol that underlines Bitcoin. Okay, so what is amazing about blockchain
is that, as it turns out, they’re not just a empirical success, blockchain style consensus
is actually a mathematical breakthrough, because now that we understand the mathematics behind
blockchains, we know that blockchains mathematics kind of completely departs from classical. Okay. So what are the advantages of blockchain style
consensus? First, they are very simple, right? The protocol rule is that everyone looks for
the longest chain and they try to extend the longest chain by mining the next block. These protocols have been empirically proven
to be robust, on a large scale, and we actually now, can also articulate some of these robustness
properties, mathematically. Okay, but the biggest drawback of blockchain
style consensus is that they are very slow, right? As you’ll know bitcoin takes 10 minutes per
block, and if you want to have sufficient confidence, you should wait for six blocks
which is like an hour, right? You have to wait for an hour for your coffee. Another complaint about blockchains is they
waste a lot of energy, right? People have to solve computational puzzles,
but for the purpose of this talk, I’m going to ask you to take a leap of faith and believe
that the energy waste problem is already solved. Like for instance, there has been quite a
few works that show how to instead rely of proof-of-stake assumptions, to build blockchain
style consensus and meanwhile we can precisely emulate the mathematics behind Nakamoto’s
blockchain. So in the rest of the talk, whenever I mention
blockchains, it can be proof-of-work, it can be proof-of-stake. Okay, so of course this is landscape, and
again the question we are trying to solve is: you know, what is the dream large-scale
consensus protocol? And obviously classical, blockchains, neither
of them will address our needs. Okay, so I’m going to try to answer this question
in the remainder of the talk. The plan is the following: I’m going to start
with classical, I will explain how classical consensus works, and in the middle I will
get stuck, when I get stuck, I will try to combine the best of both worlds. So let’s begin with classical. And here’s our scenario. In this scenario we have a leader, who’s Vitalik,
Vitalik is the founder of Ethereum, and we have a committee of stakeholders. So in this case the stakeholders are superheros,
okay. So now, this committee are going to be the
voters, right? Some of these voters can be corrupt, like
in this case, Loki’s corrupt. And of course the leader can also be corrupt. Okay, so what I’m going to explain, is a very
simple voting based protocol, and I want us to think together what properties this protocol
can achieve, and what it cannot achieve. Okay? So let me describe the protocol first. Step number one, the leader is going to make
a proposal, the proposal contains the sequence number, and the batch of transactions you
want to confirm. So for simplicity, I’m also going to call
this a single transaction, but in practice, you know, you want to confirm a batch at a
time. Okay. So the sequence number, what is the sequence
number do? Remember we are building a linearly ordered
log, in the end. And the sequence number is going to determine
where in the log this transaction will land. So in our entire protocol we need to run many
little instances of this little voting protocol, but for the time being we are going to focus
on one single instance. Okay, so leader makes proposal, step number
two everyone’s going to vote. Remember in this case, Vitalik proposed a
golden transaction and if you’re honest, you will vote precisely, for the golden transaction. But of course in this case, Loki’s corrupt,
so he didn’t follow the protocol, instead he voted for a blue transaction and a red. Okay, so what do we do now? Now we wait. We wait until we hear enough votes on the
same transaction, and once we hear enough votes on the same transaction, we can confirm
and output the transaction. So very simple, three-step voting-based protocol. And the most important thing to remember about
this protocol the honest nodes are going to vote uniquely. If I’m honest, I’m going to wait until the
leader makes the first proposal, for every sequence number, and I’m going to vote on
that proposal, and only that proposal. Okay, so by the way, a vote is a signature
on the transaction, that’s being proposed by the leader. Okay. So with this in mind, I’m going to give you
a very simple consistency proof. And to understand the consistency proof, let
me describe the parameters first. So we are going to assume that less than one
third of the committee are corrupt. So N is the total number of nodes, and less
than one third of them are corrupt, and in this case, I claimed, we only have to wait
for two thirds of the people to vote. Once I collect two thirds votes, I can confirm. And here’s why this protocol gives you consistency,
imagine in this case, Spider-Man has heard two thirds of the people voting for the red
transaction, and Iron Man has heard two thirds of the people voting for the orange transaction. Now if you do the math, because of the very
simple pigeon hole principle, the intersection of the red set and orange set of people, this
intersection must contain at least one honest node. So it’s very simple pigeon hole principle,
nothing fancy about it. But remember that honest guy is going to vote
uniquely. That honest guy in the intersection, that
is why the red transaction must be equal to the orange. And in this consistency proof, notice that,
the only thing it relies on is the fact that honest nodes vote uniquely, and we never relied
on the fact that the leader is honest. Okay, so keep that in mind. And let’s think about, you know, what this
very simple protocol can give us, but we are going to look at it from two different worlds. We have the world where Vitalik is honest,
and the world when Vitalik is corrupt. When Vitalik is honest, you know, things are
all good, we have consistency as I’ve just proven to you, and we also have liveness,
because when Vitalik is honest, he is going to propose the same transaction to everyone,
and soon enough, you’re going to collect enough votes, right, because two thirds of the people
are honest, and now you’ll make progress. But what happens when Vitalik is corrupt? As I said, we still have consistency in this
case, cuz the consistency proof never relied on the fact that the leader’s honest. But we don’t have liveness, because for instance,
if Vitalik is corrupt he can simply just crash and not make any proposals, and he can also
propose different things to different people and in this case, everyone’s going to end
up voting for a different thing, and you can wait and wait, and there’s never enough votes
for the same transaction so you get stuck from that point on Okay. So, at this moment, it seems like the cracks
of the problem, is that we have to solve the liveness issue, when the leader’s corrupt,
and if we can do that, then everything will be good. Okay. Unfortunately, this problem is not easy. If you look at all these classical protocols,
like PBFT, Paxos, they resort to a very complicated recovery path to solve the liveness issue
when the leader’s corrupt. And I don’t want to have to explain how this
works. So let’s kind of take a step back and look
at how, you know, most of these classical protocols work. They actually share a very similar structure,
there’s a very simple normal path, consisting of one or more rounds of the voting protocol
I mentioned. But when the normal path fails, you fall back
to a very complicated recovery path. If you look at this picture, you’ll want that,
most of the time, the system is operating in the green part. But then again, if you look at the code, most
of the code actually is in this red part, and that’s where are the bugs and complexity
are. So that’s not very desirable from a systems
perspective. What does this complexity mean in practice? Like for instance is a San Francisco
based company they’re rolling blockchain solutions for Visa, and they basically completely gave
up on this, implementing the recovery path. What this means, is that when this normal
path fails, they have to fix the problem manually, which means they have to do all this red stuff
manually, and imagine if this protocol is deployed among 100 banks, and you have to
basically go to all of these banks, sync their logs, and re-bootstrap the protocol, and that’s
going to be really painful in practice. Okay. So at this moment, we are kind of stuck, but
as I promised when we got stuck, we are going to try and combine the best of both worlds. So our idea is very simple. This is classical, we ditch all this complicated
red stuff, we replace it with a very simple blockchain, and that’s the high level idea
behind Thunderella. I will explain to you how to instantiate this
idea in a provable, correct manner. But before that I want to say, you know, I’m
also going to call these, the fast path and the slowchain, right? The slowchain can be Ethereum, could be Bitcoin,
but it could also be a proof-of-stake blockchain. Okay. And here are the goals that we want to achieve. We want that, almost all the time, the systems
is actually operating in the fast path, and in the fast path, we can confirm transactions
in two to three actual network round trips, you don’t have to wait for even a single block
interval. And in the fast path, we also rarely have
to use the slow chain at all, like, that’s why we get very high throughput. But when you’re under attack and when things
are not so good that’s not the end of the world, because, you know, in the worst case,
we fall back to the security and performance of today’s blockchain. Okay, so that’s what we can achieve. Before I describe the protocol lets recap
the scenario. We have an underlying blockchain, for now
you can think of it as Ethereum, but it can also be proof-of-stake blockchain, we have
a leader and we have a committee. I’m going to ask you to take a leap of faith,
and imagine that there’s a mechanism to elect the leader and the committee. And both the leader and the committee can
evolve over time. So I won’t describe in detail how these are
elected, for instance they can be elected through stake, right? And to get our worst-case security guarantees,
we have to assume that the blockchain is secure, and if this proof-of-work blockchain, we have
to assume majority of the mining power is honest. If it’s proof-of-stake, we have to assume
majority of the stake is honest. We are also going to assume that the majority
of the committee are honest as well. We don’t have to assume anything about the
leader, even if the leader’s corrupt, it doesn’t break the security of the scheme. Okay. Now, I will first describe the fast path. But I kind of already described it before,
at this moment, I’m going to focus on what happens when there are many of these little
instances for different sequence numbers, right? So remember the protocol is that the leader
makes a proposal, everyone votes. So now there are many of these sequence numbers. And I’m going to introduce a notion called
notarized, if any transaction has collected three quarters votes, I’m going to call that
notarized. So the threshold has changed here a little
bit, because we are assuming majority honest now. Earlier we are assuming two thirds honest. So for majority honest, to get the consistency
argument to work, you have to have three quarters votes. Okay, so now, in this picture, one, two, and
three have been notarized, four is missing, and five and six are notarized too. In a cryptocurrency system, it is very important
that you process the transactions in a linear order, because for instance, four and five
can be conflicting transactions that try to spend the same coin, and you, we wanna prevent
double spending here. In this case, the only transactions we can
process, is one, two, three. And for the lack of a better term I’m going
to call this, the maximal lucky sequence. You can confirm and output the maximal lucky
sequence. Alright? So this is basically the fast path algorithm,
and remember the problem we need to address, is when the leader’s corrupt, we need to solve
the liveness issue. So how do we do that? Well, so far, we haven’t used the slowchain
at all, right? We have to use the slowchain somehow, and
that’s how we are going to address the liveness issue. How do we use this slowchain? We use it for two different purposes, first
we are going to use the slowchain to detect when the fast path has failed. And once the fast path has failed, and we
detect that, we want to initiate a fallback procedure, and fallback to the slowchain. So in this slow mode, people can still post
transactions to the slowchain, and have them confirmed on the slowchain. Things are a little slower, but the system
doesn’t stop. And when you are in the slow mode you can
always re-bootstrap the fast path. Okay, so that’s what I’m going to explain,
I will explain number one, how to do this detection, and number two, how to do the fallback. So first how to do the detection. The challenge here is that we need a robust
mechanism, a mechanism in which faulty nodes cannot implicate an honest leader. Because we want to prevent denial of service. Okay, so here’s a very simple idea. Imagine when the protocol is operating normally,
the committee posts notarized heartbeats to the slowchain, so a heartbeat is, contains
a hash of the fast path logs so far, and then all the committee members are going to sign
this hash, and then when this hash becomes notarized, you post it to the slowchain. So every now and then, you know, you keep
posting these heartbeats. And these heartbeats also serve as a checkpoint
of the current fast path. Okay, so normally, as you can see, in normal
mode, we barely have to use the slowchain at all. And the only way we are using the slowchain
is the post these periodic heartbeats. Okay, so imagine at some point, I observe
that K blocks have gone by without a heartbeat, so this looks suspicious, right? So K, in this case, is a security parameter,
you can imagine K to be six for Bitcoin, and K is let’s say 300 for Ethereum. So the heartbeats have been missing, what
could have caused this, right? So for instance, if the leader has crashed,
and is not making proposals this can happen, if the leaders censoring transactions, and
the committee want to vote the leader out, they can also refuse to sign. In all of these cases the heartbeat is missing. And when this happens we want to fall back
to the slowchain. So I have to tell you how to fallback, in
a provably secure manner. And to understand how to do the fallback,
let me try to explain the problem here, right? The problem is, we can all decide that we
want to fall back, but at this moment, our fast path log can be different. Like for instance, in this case, Spider-Man’s
network is a little slower, so he believes the fast path log ends at three, where as
Iron Man believes it ends at six. And at this moment they all want to fallback,
it is important that they reach agreement on what the fast path log is, before falling
back to the slowchain, right? Because once you are in the slowchain, you
post transactions to the slowchain to confirm, and that’s appended to the fast path log. How do we reach this agreement? If you think about it for five minutes, you’ll
realize this is an agreement problem in itself. Like earlier, we pointed on the liveness issue,
but here we can no longer point on liveness, we have to have an agreement mechanism that’s
both consistent, and live. And how do we do this? Again, we use the underlying slowchain for
help, right? The slowchain is kind of like, a very robust
mechanism, consensus mechanism, that’s always there. You know, it’s short and steady, but it’s
just, you know, it’s a little slow. Okay. So now, the fast path has failed, and the
nodes are going to use the slowchain to communicate, to reach consensus on where the fast path
log ends. And the way you do it is the following: first,
when we want to fallback, we stop participating in the fast path, we stop signing. Number two, we are going to tell each other
the notarized transactions we have seen. And number three, we are going to post all
the notarized, but unchecked transactions to the slowchain. So, if the transaction has been confirmed
by some heartbeat on the slowchain, then it’s checkpointed. But we are going to post, basically, all the
notarized, but uncheckpointed transactions to the slowchain. You are going to keep doing this, for K number
of blocks, where K is again, a security parameter, like 300 for Ethereum. And by the end of the grace period, we basically
reach agreement on where the fast path log ends and one subtlety here, is that, if I
have confirmed some transaction earlier, I don’t want it to rollback by the end of the
grace period. But we can guarantee this because, the maximum
lucky sequence contained in the slowchain, by the end of the grace period, is at least
as long as everyone’s fast path log when we started the grace period. So this is something we can prove. Alright. So that’s basically the protocol. To summarize, the way we address the liveness
issue, is to use the slowchain to detect fast path failure, and when we have detected failure
we fallback to the slowchain, and, you know, in the slow mode, you can always try to re-bootstrap
the fast path. Okay, and to quickly recap the properties
we can achieve, right? To get worst-case security guarantees, we
only need to assume majority honest. And I want to stress, that we don’t have to
assume anything about the leader, for worst-case security. Because even if the leader is arbitrarily
malicious, the only thing it can do is to slow down the protocol. It cannot break consistency, it cannot cause
the protocol to stop either. So that’s actually, you know, even though
it may seem like we have a central point, in the protocol, the protocol is actually,
fundamentally decentralized, in terms of security. Okay. So now, when things are good, we can be fast. And what do I mean when I say things are good? If, lets say, three quarters of the committee
are online and honest, and if the leader’s also online and honest, then things are all
good, and we can confirm transactions very fast. Everything’s on the fast path. Okay, to recap the protocol, here are the
two sentences you have to remember, to reconstruct the protocol yourself, you know, after this
conference. When things are good, we perform a single
round of voting in the fast path. And when things are bad, we use the slowchain
for fallback. And when you are in the slowmode, you can
always re-bootstrap the fast path. So that’s Thunderella! I’ve skipped a lot of these details, like
how to elect committee, how to elect the leader, and there are a bunch of practical optimizations
that we do, which I kind of skipped for simplicity, in our paper we also have formal proofs of
security. So this is not proof-of-work, this is like
security-proofs. I’ve been asked what is proof-of-security,
alright. Okay to, before I end, I wanted to describe
some of these insights we have gained, right? So, in particular, why, after 30 years of
work, we can claim to be a lot simpler, and faster than everyone else, there has to be
some insights here. Okay. Now, if you look at all these classical protocols,
they work in what’s called an asynchronous model. So what is an asynchronous protocol? An asynchronous protocol doesn’t know how
long it takes to deliver messages. In other words, the same protocol has to work,
no matter how long it takes to deliver messages. As it turns out, if you have worked in distributive
systems at all, asynchrony makes life a lot harder, so that’s why these classical protocols
are very complex. In comparison we kind of baited and switched. We switched the underlying protocol to a synchronous
one. So what I’m saying here is blockchains are
synchronous protocols. And that’s why, so in comparison with asynchronous,
a synchronous protocol is allowed to know the maximum network delay. And because of this extra assumption, these
protocols can be simpler than asynchronous protocols. So blockchains are synchronous because we
have to parameterize the block interval, and there are mathematical proofs that show, you
know, if you want to resist 49% attack the block interval has to be like 60 times the
maximum network delay. And that’s why the protocol needs to know
the maximum network delay. Alright. So you may ask, you know, why earlier people
didn’t use synchronous protocols. Like why? Synchronous protocols are only used today
for large-scale. There is actually a reason for this, because,
in the classical Google/Facebook scenario, they want to achieve, let’s say, microsecond
to millisecond latency, and then people tend to believe that these synchronous protocols
are slow, because, like in these cases, you don’t want to even wait for a single block
interval, or a single synchronization round. But what we show, is, this classical wisdom
is kind of, not correct. Like we show that you can be a synchronous
protocol, and you can still be fast. So if you look at Thunderella, most of the
time, it’s operating in the asynchronous land, and that’s why we are fast. But fundamentally, we are a synchronous protocol,
and that’s why we are simple. So this is basically a new paradigm for designing
consensus, and this new paradigm is actually very powerful, even from a theoretical perspective. So now putting my theory hat on, right, so
earlier we know that if you want to work with asynchrony directly, there are a bunch of
these law bounds, for instance one of these law bounds says that an asynchronous protocol
cannot tolerate more than one third corruption. So in some sense, this is saying that, if
you want to be fast, you cannot tolerate more than one third corruption. But what we are showing, is that, maybe this
isn’t the case, like, Thunderella can tolerate minority corruption, and yet we are still
very fast. Okay. So this is a very simple little protocol,
but it’s very powerful. Okay, to, before I conclude conclude, I want
to mention that, you know, simplicity is your best friend, in these large-scale distributive
systems. Alright, so our company is hiring, if you
are interested, you should contact us, if you are, you know, a dApp developer wanting
to use our platform, you know, to develop the next generation, all the more sexy ThunderKitties,
you should also contact us! Thank you very much!

Leave a Reply

Your email address will not be published. Required fields are marked *