Bringing Retail Trading to 34 million Cryptocurrency Wallets (Cloud Next ’19)


[MUSIC PLAYING] LEWIS TUFF: So I’m Lewis Tuff,
head of platform at Blockchain. So today I’m going to
be talking about how we brought retail trading to 34
million cryptocurrency wallets globally. So first off, who is Blockchain? Well, you may have heard of
the term brandied around. So we are Blockchain.com,
a technology company, and actually the world’s
leading digital asset platform. So we started our life
as a Blockchain explorer, specifically for Bitcoin. And then we’ve evolved
and grown many products over the last five years. One of our most popular is
our cryptocurrency wallet. We’ve also launched a number
of our consumer products, including the ability
to trade cryptocurrency. And we have data
products, market research, APIs for live streaming data
on Blockchain analytics. And we’ve also launched a number
of institutional products. I’ll go into more
details on some of these later on in the presentation. So how do we compare
to the market? And who is our main competition? So you may have heard of
a company called Coinbase. So the difference
between us and them is that they’ve focused more on
the custodial offering, which means they are taking custody of
your digital assets and funds. And in return, you can use
their platform and services. For us, we’ve taken a
non-custodial first approach, which means we’re
empowering users to take full control
of their private keys and digital assets. At no point do we take
ownership of those and compromise your
security or privacy. So that means you are no longer
reliant on a third party, a centralized entity, or
risk of any one person losing or being compromised with
your digital assets and funds. So our key aim really is to
empower users and educate them to allow them
to take control in a secure and simple manner. And our tagline here, you
can see, is be your own bank. We want every user to consider
becoming their own bank. And that is surprisingly
easier said than done. So we have three main clients. We have an iOS, an Android,
and a web platform. Here are some screenshots
of our iOS application. And I’ve highlighted some of the
key features and functionality that our mobile
applications offer. As I mentioned, these
are all non-custodial. So their fat
clients, or wallets. So a lot of the heavy lifting
is done actually browser side, or on the client themselves. Our backend is purely
to store some metadata and to interact with the
underlying Blockchain network. So here you can see
you have an overview of your cryptocurrency assets– your current balances,
the live market price. You have the ability to buy and
sell cryptocurrencies with fiat on and off ramps. You can exchange
your cryptocurrencies for any other asset you
may be interested in and get live pricing. And I’ll go into a bit more
detail on the architecture behind that. And then finally,
a common use case, you can send and receive
cryptocurrency funds very easily and very
securely from right within our mobile apps or
from within the wallet. And as I say, you never have to
give up custody of your assets to do this. So where are we today? So right now, we have 34 million
wallets globally and growing. We’re doing over 10 million
new wallets a year, currently. That equates to around
100,000 transactions per day that are being processed
through our software platform. In total to date,
that’s $200 billion worth of transacted volume has
flowed through our software application and platform. And I think, for
me, most excitingly, that actually accounts for
25% of the global network traffic on the Bitcoin network. And then finally,
we currently have a user base that spans over 140
different countries worldwide. So first up, our Explorer– so here on the
slide, you can see we have this common interface
into many different underlying blockchains, notably Bitcoin,
Ethereum, and Bitcoin Cash right now. There’s three main
sections on the explorer. The first is the
input to look up your address, your
transaction hash, or a block. The second middle section is
key stats and blockchain metrics such as the current difficulty
or the mempool size. And then finally,
we have a real time view of incoming
transactions and blocks that are being mined. So I want to delve
now a bit deeper into what does it take to
run a web platform like that? And how do we achieve, in
real time, the stream of data from these underlying chains? And more importantly, how
do we do that at scale? If we’re getting around 2
and 1/2 thousand requests per minute consistently,
and that kind of peaks even higher with
significant price movements. So how does it work? So this is obviously
a simplified overview. But as I mentioned, we
have three main consumers of our platform. So that is our mobile wallets
on iOS, Android, and the web. We have our Explorer,
which I just showed you on the previous slide. And we also have a
number of public APIs where any individual
or company can build on top of our platform. And they can actually use it for
creating transactions, sending, receiving, and also just
consuming and adjusting the blockchain data
directly without having to do the kind of hard work of
integrating with the underlying chains. And in the middle, we have
this data ingestion service platform. So internal to us,
we’ve [? bought ?] out a number of services that
are responsible for interacting with underlying chains such
as Bitcoin, Ethereum, Cash, Stellar, et cetera. So we create a
wrapper around these that interacts with the low
level APIs that they provide. We actually spin up and run
all of our own nodes in GCP. And then we have a number
of our servicing components that then ingest all of the real
time blocks and transactions, extract the key
[INAUDIBLE] and metrics, do any kind of
transformations we need, and persist that within
a relational database. We then serve that back up to
our clients on demand at scale. Between that, we do
have caching layers and a number of
other technologies to mitigate issues at scale,
which I will go into a bit further. The key thing here is that a
lot of the complexity of pulling out a very simple
attribute like your balance is abstracted away from the
user, both in terms of the UI and UX experience you get
from within our clients, but also for our unified API. So you no longer have
to spin up your node, become a part of
the public network, understand their APIs, which
are often pretty low level, interpret the data, which
again is usually at a lower level of abstraction. Now you just have one
API to interact with to seamlessly send,
receive, and process all the blockchain data. So how did we evolve? So I wanted to
just set the scene on the scale of
our infrastructure to support our platform,
both the Explorer, cryptocurrency wallet, and
a number of other products. So you can see here we’re
running over 2 and 1/2 thousand cores currently
in GCP and around nine terabytes of virtual memory. You can see the graph. The wallet accounts
that have been created over the last five, six years
have grown tremendously, especially since
2017, when a news kind of hit mainstream,
and cryptocurrencies were suddenly being discussed
by every single person on the street. I think, astonishingly,
and something that we realized
wasn’t going to scale was that a third of
our infrastructure was dedicated to our
in memory NDB cluster. And this was
running our Explorer and our cryptocurrency
wallets, sort of the backbone for all those. And it was done ourselves
by our engineering team. So we were using, most
predominantly, Compute Engine and running all this
stuff ourselves at scale. As many of you will know that
doing that beyond a point becomes extremely costly, but
from a monetary perspective, but also from the operational
overhead of maintaining it, upgrading it, and dealing
with the fallout of failures. So we ran over 24 nodes to
bake in our own redundancy and make sure we
were fault tolerant. And to do any kind
of scaling, we could not scale
linearly or on demand. We had to actually spin
up a whole new cluster alongside our
production cluster, restore each of the
underlying chains– for example, the
Bitcoin blockchain– from the beginning. And that would take around
one week, all being well. And then once it had caught up,
we could then flip the switch and move our applications over
to the new production cluster. Because of the pain involved
and the time it took to do this, we would just– last year just
doubled our infrastructure overnight just to support
our exponential growth. Now it’s quickly realized
that this is not sustainable. So the beginning
of last year, we started to consider what are
the other options we have here? And how can we reduce
the operational overhead of running this at
scale and ensuring that our infrastructure
was going to support us on our mission to
grow and redefine the financial
infrastructure of the world? So in comes Cloud Spanner. So we looked at a number
of different products. And we decided to try a
prototype and [? POC ?] on Cloud Spanner. Now the impressive thing here
is that within a few minutes we were able to spin up
the production cluster and have strong consistency,
globally distributed, at scale, and all of that for a
marginal increase in cost as to running just
Compute Engine and managing this
infrastructure ourselves. So the early signs
were promising. We then switched and
re-architected our application, started to reinvent
our schema, and then started to actually go all in
on Cloud Spanner, initially for our Bitcoin blockchain
ingestion process. One of the key things
we did early on, which really benefited
the engineering team, was that we connected with the
Cloud Spanner team directly. So we had a few conversations
with technical [INAUDIBLE] and a number of
engineers in her team. And they’ve really helped us
through the initial hurdles of getting set up on
a brand new technology and really understanding
the nuances of where it was going to work and some
of the constraints of running at scale. We had to make some
design decisions early on that without
that help would’ve been costly to change later. I think one of the key features
that really benefited us during this iterative
development process was the fact that we
can now do a full import and export in record time. So we could back up our whole
database in just 30 minutes. As I mentioned before, it
took one week to restore that. We can now do it in a
batch job overnights, in just 8 and 1/2 hours. So what this meant is
that we could rapidly develop our application and
services, do a lot of testing and playing around
with different schemas and architectures, and
also, once we were happy, we could then promote
that through each of the environments–
dev, staging, pre-prod, and then finally, to production. And that was simple,
quick, and efficient. You’ll notice here, as
well, that we have– this is our
production comparison. So we have under half
of the original nodes that we needed before,
now just 10 nodes. And all the reasons for
that is because you’re getting redundancy,
high availability, strong consistency,
and global replication as part of the package. And then finally, we
could scale on demand. So at the touch of
a button, we could double our infrastructure. We could also set
predefined thresholds and have that elastic on demand. Final thing is we actually
leveraged table interleaving, which meant that we could take
some of the complex queries we were running against our
NDB cluster previously and reduce the time it took to
run those from 30 seconds down to around 15 milliseconds. So the whole migration
was hugely impactful for us and our products. And now we’ve been running
this for around six months in production. Overall, on average– I just
did a comparison for our team– and we are saving
around 30% month on month in terms of
the dollar amount, which is pretty impressive when
usually some of hesitation about moving to software as a
service and some of these cloud services is the expense is going
to just creep up as you grow. But actually, we’ve
seen the opposite. We’ve not only reduced
the operational overhead of managing this to scale. We’ve also made
our CFO very happy. So how else is
this dataset used? And what other
products do we offer? So as I mentioned
earlier, we built out a crypto to crypto
exchange product, again, leveraging a
number of cloud services and operating at a
significant scale. So this allows you to exchange
across chain transactions and get live
exchange-like pricing right from within your
non-custodial wallet. So again, you never have to send
your funds anywhere and give up the custody of your assets to
any one individual, company, or entity. Your private keys remain
with you on your device or in your browser. Here’s a quick demo
of the product. So this is on one of
our mobile applications. You’d simply just type in the
amount you wish to sell or buy. You’d get a live pricing
right from within the app in real time. And you can exchange between
multiple pairs that we offer. You finally have a summary
screen, Confirm, and that’s it. So what’s really going on? And what are some
the complexities with building out this– hopefully you’d agree–
smooth user experience? So the key challenges– security– so for us, Blockchain
as a company, security is our number one priority
in everything we do. You may have heard of
some of the horror stories from cryptocurrency
companies in the space that have been hacked, compromised,
or lost their user’s funds. We don’t want be
the next headline. So everything we do goes
through a very rigorous process. Volatility– in the
marketplace, if any of you follow the
cryptocurrency pricing, you’ll see that it’s extremely
volatile, maybe swinging by 20% to 50% in a day. So how do you create a
smooth user experience when the price is jumping
around all over the place? The third challenge–
public network– I think this is
an important one. Now we’re relying
on infrastructure that is outside of our control. We’re no longer just relying on
our private network within GCP and the services
that we’ve built. We now have to
broadcast and send transactions over a network
and many nodes globally distributed. So what are some of the
constraints when you do that? And finally, high availability– I think it’s kind of a given
for any technology company now. But it’s worth noting, how
do you achieve that at scale? And again, what are some
of the design decisions and architectural
decision you have to make early on to
ensure you achieve that? So the interesting
part– how does it work? So I wanted to give you a
little bit of information on what’s under the hood,
and how does the architecture actually look? This is a slight
simplification, but it gives you a very good idea about
some of the core components and interactions between those. We do leverage a number of
cloud services to achieve this. And I’ll mention where
they fit into this stack. But as you saw in the
previous slide, first of all, we start with a client,
so Android, iOS, or web. You then have a
secure connection to our backend of
our client gateways. And the user will connect
to the swap product, initiate a secure
website connection, and this will start streaming
live prices to their device. The user types in the volume
they wish to sell or buy. And we then adjust our
pricing based upon that input. We have a liquidity
and execution engine, which is the driving force of
all of our trading activity. And that has connections to
many different exchanges, market makers, and brokers, and
also our internal inventory. So that’s managing risk,
portfolio management, and pricing. Internally, as well,
we’ve also built our own custodial solution. This is a multi-tiered
custody solution on which the hot wallet is
at the top of the stack. So this is the only
part of it, or layer, that is programmatically
accessible and secured within our internal network. All other are air gapped and
have a number of other security policies in place. So once the user’s happy with
the price, they hit exchange. What happens next? So our client gateways will
go off to our liquidity engine and say, hey, I want
to execute the trade to sell half a bitcoin. The liquidity engine with
then lock in the price and send back how much
of the counter currency they wish to receive. So let’s, for
example, say I want to sell Bitcoin for Ethereum. In parallel, we
initiate requests to our hot wallet, which is the
gateway to all of our nodes. So these are the blockchain
nodes, Stellar, XLM, Bitcoin, Bitcoin Cash,
Ethereum, et cetera. We’ll then generate a deposit
address which the hot wallet service will persist
and start listening on for any incoming deposits. We think combine
these transactions, send it back to the
client, and the client then returns to the user
the final amount they need to send us to
receive their Ethereum. And on the client’s
side, the browser and/or the mobile client will
construct that transaction and sign it using
their private keys and then broadcast that out
to the underlying respective network. None of that
touches our backend. That’s all done client side,
as I mentioned earlier. Now it will be
sent and broadcast across the public network. And we wait patiently for that
to hit a network broadcast event and a confirmation event. And the hot wallet is
responsible for that. Once those have been received
off the public network, we then use Kafka as
our main message bus. So we have an event
driven architecture. So this sends an
event over Kafka. One of the consumers
is our client gateways. That picks it up and
then processes that. We then compare that against
our state machine, transition through the lifecycle,
say, hey, deposit received. We now need to
pend a withdrawal. We update the user and say, hey,
we’ve received your deposit. We’re now going to initiate
the withdrawal request for you. We then make a call
back to the hot wallet, and send the amount
we wish to withdraw and the deposit address that was
sent from a client originally. And again, that’s then broadcast
out on the public network. The client will then receive
the funds in due course, usually within half an hour. And we notify them via
push, via email, via SMS. I think one of the interesting
cases that happened recently is that these public blockchains
can have many different use cases apart from just
transferring a store of value. And one of them is actually
being leveraged for marketing. So what that means is
actually every time anyone sends a transaction on
the Bitcoin Cash network, if someone is
listening, because you have this public
ledger– they’re listening for this address. Any new address that
appears, they then send you a little
marketing message. And what does that
mean materially? It means that actually
we, as a company on our [INAUDIBLE]
address receive more funds than the user actually sent. And also, the user receives
slightly more funds than we send them. They also receive two
transactions instead of one. So this was kind of a
very interesting use case where we had to be
flexible and defensively program for this in the future. And finally, the node
gateways at the bottom are really the data
ingestion service platform that I mentioned previously,
and are our way of wrapping each of the underlying
blockchain nodes. So one of the other interesting
things that we’ve managed to do is build out a
hardware device which is a secure way of storing your
cryptocurrency assets offline. This has full
interoperability with all of our platform and software. So you can use our swap product,
as I have shown you before. And your keys never have to
leave the physical device. So this is hugely
beneficial to a lot of users who are
security conscious and maybe don’t trust the
terminal or workstation they are using. And we’ve managed to achieve
this leveraging our existing interface and software. Some the key benefits
of using Google– so I mentioned a few
of these already. But I wanted to kind of
summarize and highlight these before finishing up. So one of the things that GCP
has really allowed us to do is scale up our
infrastructure on demand. When you’re dealing with tens
of millions of wallet accounts globally, it’s important
that you’re not constrained by
the infrastructure that you are using. We’ve had many cases when the
price has moved significantly. And suddenly the number of users
that we’re onboarding every day has gone up five or 10 times. And we’ve had to actually
scale up our infrastructure significantly during
those periods. I mean, in the peaks
of the last few months, we’ve seen over 100,000
new wallets being created every single day, and up
to one million in a given week. So there are significant
strains on our infrastructure that we have to account for. And one of the things we do
do, as I mentioned previously, is adding a number
of caching layers, take a lot of advantage of CDNs,
and do a lot of load balancing across many instances. For the most part, all of our
infrastructure is stateless. So that means we
can horizontally scale it on demand. Also, on the previous
architecture, we’re leveraging cloud
services where possible. It means that our team
can focus and hone in on some of our
very specific domain complexities and
challenges and not worry about running our
infrastructure at scale and solving the
problems that Google have been doing for many years. So they spend less
time, energy, and effort on managing our
infrastructure and more time focusing on building
what we believe to be the future of finance. As I mentioned, we
have high availability. So we’re leveraging Cloud SQL. We’re leveraging Spanner. We used a number
of other services quite broadly across
our architecture. As you can imagine, we
have a lot of microservices within our architecture. So being able to
just quickly spin up a highly available, redundant,
and scalable database at a click of a button
is very powerful. And we’re using both Postgres
and MySQL engines for that. And the third point is that
it’s engineering friendly. I think all of our engineers,
whatever their skill set or focus, enjoy
the GCP experience, whether that’s because
of using the CLI and pragmatically writing
scripts around that or using the interface
itself to dive into problems and debugging issues. And finally, the fourth one,
which I mentioned earlier with our Spanner story,
is the technical support. So I think a lot of people don’t
realize the amount of support that Google offer
you as a company. We can tap into a wealth of
knowledge and domain expertise through our account managers
and customer engineers. And we do that pretty regularly. We’ve had many conversations
comparing Google products to other open source
technologies and the pros and cons of using them. So another story that
happened last year, we were just trying to decide
on the correct message bus architecture for this trading
product and for many others. One of our key requirements
was sequencing. So we then spoke with
the technical PMs from the Pub/Sub team and had
a real candid conversation about what were the
pros and cons of using Pub/Sub above Kafka. And the net result
of that conversation was that we used Kafka. And that was their
recommendation. I think that’s quite
refreshing when you can actually connect
with the engineers at a cloud company. And they recommend a
product which they do not support or sell themselves. [MUSIC PLAYING]

One thought on “Bringing Retail Trading to 34 million Cryptocurrency Wallets (Cloud Next ’19)”

Leave a Reply

Your email address will not be published. Required fields are marked *