Archain Interview: Building an Uncensorable Archive of Human History

This article ties in to a podcast we recorded with the Archain team, available here.

Unreliable permanent storage is a problem that most people don’t know exists, but will have likely experienced first hand.

Under my desk, I store my fine collection of bricked laptops with the hope that one day I might salvage them for parts — on the hard drives of those long-dead machines, there are hundreds of gigabytes of lost data. Old music project files, documents, and other things I should’ve been smart enough to back up online.

However, storing files on the cloud isn’t totally safe. In 2012, Megaupload — one of the biggest cloud storage sites at the time — shut down. A 2013 shutdown of San Diego-based cloud storage provider Nirvanix left thousands of customers scrambling to retrieve data as the company closed its doors. Even giants like Dropbox aren’t immune to the problems inherent with centralized networks; in late 2014, an issue with Dropbox Selective Sync caused permanent data loss for a number of users.

The solution? Archain — a decentralized permanent storage system that makes it impossible for data to be removed or amended after it’s uploaded. It works with similar technology to Bitcoin.

Bitcoin is a permanent ledger of financial transactions that eliminates the need for a central bank or authority by sharing and verifying the details of each transaction with everyone in the network. The ledger part of Bitcoin is its blockchain. Archain eliminates the need for a centralized cloud storage provider. It does this by storing files on the systems of network participants and incentivizing miners to mine rare data first.

Archain CEO and co-founder Sam Williams devised the idea of storage over the blockchain but ran into inherent problems with the way the blockchain is designed. In Bitcoin’s case, each new participant must mine the complete blockchain — that’s a record of every Bitcoin transaction ever. Archain uses a variant of the blockchain they call the blockweave, which smartly divides data storage responsibilities between the whole network, reducing the amount of computing power any single user needs to contribute, and making Archain much more scalable and efficient than other cryptocurrencies. With this blockweave technology, Archain is truly set up to be able to create permanent replica of the internet.

In fact, the initial scope of the Archain project was focused purely on internet archiving. Using the web extension (or by building applications that crawl a portion of the web), users can create permanent, uncensorable copies of web pages that get stored on machines all over the world. Unlike the centralized technology behind other archives like Archive.org, Archain distributes the data widely enough for it to require the combined processing power of multiple world governments to alter — a highly unlikely situation.

This podcast episode was recorded on November 2nd. Since then, the team has announced the rules of their Archain platform app development competition, and opened their test network up to early backers.

We also have confirmation of the GENESIS event, an internet freedom art exhibition that will be hosted by Archain on the 15th and 16th of December in St Ethelburga’s on 78 Bishopsgate in London. Find out more about the event here.

Listen to the interview above, or keep reading for an abridged transcript.


On the architecture of Archain

It’s worth noting that the main part of Archain is an entirely unique concept. With it, the team can offer the world’s first scalable crpytocurrency that solves many of the energy efficiency issues that come with a traditional blockchain.

Archain whitepaper

Read the whitepaper for a technical explanation of Archain

I asked the team to explain the platform in their own words.

Archain is a decentralized permanent storage system. On top of that, we’re building an internet archive that will be open and available to people at all times.

Our core mission is to put human history on this blockchain-like data structure, and allow people to access it for decades or perhaps centuries to come.

Unlike Bitcoin where you need to mine the whole thing, [with Archain] you can store as much or as little of the chain as you want. The advantage of storing more is that you get more of the mining rewards.

You get all of the benefits of blockchain-based consensus, but you can actually store large files and documents on it, which is the key innovation. On a blockchain, everyone has to replicate all of the data in order to take part, but here you can shard it out.

With a blockweave system, blocks (chunks of archived information, grouped together) are dynamically dependent on another random block in the system. Dependency is assigned based on the rarity of the data, so the network achieves two things in one go: definite storage of rare data, and energy efficiency.

A blockchain just verifies the next block in the system each time when it’s created, and then new members reverify the entire chain. With a blockweave, we reverify old parts of the chain as a core mechanic of the network.

This mechanic is especially suited to internet archiving and storage. The traditional blockchain method is extremely bulky, but a blockweave can scale to proper internet sizes. The team created it out of necessity after realizing that blockchain is fundamentally flawed.

We wanted to build a decentralized internet archive, and we wanted it to scale. We were pushed down this route. We were so worried at the start about how we’re going to get this to scale to proper internet sizes. You can’t do that with a blockchain, but you can with a blockweave.

On the origins of Archain’s concept

Despite realizing since that Archain had viable corporate use cases, it was first conceived out of an anxiety for how fragile and censorable our current internet really is. CEO Sam Williams explained the origin of the concept:

I was on a mountain. I was thinking about propaganda, and the way that information on the internet is absorbed when it’s not necessarily in its final state.

So, for example, if you go to the New York Times website and read a few articles, there’s a reasonable likelihood that those articles will change at any time. I was a bit concerned that there’s no permanent logging of that. You can’t go back and see what it said originally because people don’t make their understanding of the world from the final version [of information], they read the live reported version. And if that can get deleted, you end up with this sort of unverifiable chain of information, which we thought was not so great.

And also, think about the way Russian propaganda campaigns are going on in Ukraine, for example. They’ll spread a lot of information that’s both pro and against an idea and then delete that information. So, after the fact it becomes impossible to delineate who thought what when.

Sam makes multiple references to the Orwellian memory hole throughout our interview, a concept explored in 1984.

It’s about eliminating the memory hole. We’re trying to make it so that history essentially can’t be altered after the fact.

History can’t be altered if it’s stored on the Archain because that information is stored on many different computers simultaneously, and those computers cross-reference their versions of the information. If a single byte is out of place on any system, it is disregarded by the network. Sam tells me that it’s theoretically possible to alter information on the Archain, but a single entity would need to control 51% of the network, which would take more money and computing power than any country’s government possesses.

I couldn’t just snatch 51% of the network temporarily and run off with all the money. The cost of doing so would be more than a nation state could afford. There’s no single entity with enough money and power to pull it off.

It’s split across all the continents of the globe, so even if a government or a group of governments attempted to censor the Archain and change its contents, they just wouldn’t be able to get the cooperation required to do that.

Very few systems we use day-to-day are decentralized. We rely on big server stacks in limited physical locations to serve our data, which puts us in a similar situation to the scholars that suffered the loss of at least 40,000 unique historical texts when the Library of Alexandria caught fire in 48 BC. Jon Sherry, a developer on the Archain project, tells me that many modern day parallels of the fire exist, including a fire at Archive.org which destroyed unarchived data, and the Usenet archive which lost over 10 years of records.

We really think that archives should be decentralized. It’s the right way to approach the problem. For trustworthiness, as well as data reliability and no single point of failure. If there’s a fire, it wouldn’t matter. Enough of the people have the data, and there’s the enough redundancy.

On Archain’s political motivations, and the GENESIS art exhibition

The invitation to GENESIS — Archain’s internet freedom art exhibition — felt laden with political and philosophical ideology.

We wouldn’t have thought 10 years ago that internet freedom would be a particularly politicized issue. But now, it’s increasing.

The way I see it, archiving and transparency is often the concern of politically-motivated organizations.

Fundamentally, the Archain is only as politicized as the users. There’s nothing inherently politicized about Archain. It’s simply permanent storage. We came up with the decentralized internet archive first, and then we built the algorithm that would allow us to do it. And only then did we realize we’ve essentially invented data permanence.

Why limit it to internet archives? There are so many more possibilities. But, it certainly did start from a political concern.

This concern was one of the reasons I became interested in Archain, and the art and poetry planned for GENESIS places heavy emphasis on the dangers of a centralized, censorable web.

image1

Archain has posted select artworks from the exhibition on their Twitter feed already. Since the interview took place, we can now confirm the venue as St. Ethelburga’s, a small Central London church with a surprisingly modern gallery space..

The key thing we’re trying to do with the exhibition is to draw attention to the internet freedom problems the Archain helps solve. It’s making the Orwellian memory hole impossible, so it comes from a philosophical rather than technological standpoint.

I’m very close friends with a poet who’s embedded in the South London art scene. I know these people very well, and the kinds of art they produce. Especially the illustrator — Jon Speed — has worked for some very big names and is quite established. They’ve really understood where we’re trying to go with it.

On the Archain app ecosystem

The Archain team is building an open platform, so the exact applications are up to the users. To start creating an ecosystem of useful tools on the Archain platform, the team have kicked off a development competition.

We’re running the Archain app competition. We have an investment pool of $10,000 as well as 250,000 AR. We’re really trying to get people involved with the app development side of things, and build an ecosystem of things running on the network. The idea there is that we’ll help them get up to speed with development. They can start on it now — all of our code is on GitHub, and it’s open source. The winning group will offer 6% of their company for $10,000, and get a direct line to us to get help building a real prototype that’s running on the network. One of things we want to push — aside from internet archiving and personal backups — is an ecosystem of programs running on top of it. As far as we see it, it’s a permanent internet — a permanet.

To me, the most valuable application to be built on top of a permanent archive is a web spider. Spiders are programs that constantly search for web pages and upload them to a server. Google’s spider searches for new pages and adds them to its central index; Archive.org’s takes snapshots of websites over time, so users can see how they’ve changed day by day. I asked the team what they think to running spiders on the Archain.

At the moment, people will archive of their own accord. They’ll archive an interesting web page that they think might have value for the future. But, we are also looking at these things we call archive groups. A bunch of people group together because they’re really interested in a special interest topic, and they can run a little system that goes round archiving pages in that category once in every period. A good thing about the way that it’s built is that if you don’t like the way it works, you can go and build something that periodically archives the entire BBC website, for example. It doesn’t have to be manual.

Archain apps, Sam explains, are nodes that sit in the network. They can either listen, and slowly be fed all the information going through the network, or they can use it for data storage. One application that really grabbed my attention was the idea of a decentralized Twitter. I knew that Mastodon was trying some kind of blockchain implementation, but that project didn’t seem to get off the ground.

We looked at building a decentralized Twitter. Essentially, you just write the application in the usual way, but you use the Archain for data storage. You can start up multiple nodes of this web application, hit them from any end points you like, and they’re all using the same permanent storage in the middle. We have done it already, basically. Sam implemented it one evening.

The team also have a few ideas about the kinds of material they plan on keeping uncensorable and permanent.

We’ve also discussed uploading things like the Hansard, the human genome, or a text copy of Wikipedia. We put a copy of 1984 in the genesis block.

On building Archain with Erlang

Archain is coded with Erlang, a particularly rare programming language. Being rare, it’s a definite roadblock for developers looking to expand the project, but the team plans on building libraries that support some of the most widely used languages in the world, like Python and Java. Sam, who Archain’s CTO Will told me is the best Erlang programmer of the bunch, chose Erlang because it is extremely well-suited to decentralized data storage.

It’s the inbuilt support for concurrency. Massive concurrency. We can run 10,000 or 20,000 nodes inside a network using Erlang, all on one machine. That allows us to surface bugs that are found in one in 10-20 million iterations. We’ve seen other cryptocurrencies using a system where you just start 3 nodes on a machine and just hope you come across the bugs.

If you’re interested in how Archain’s architecture secures it against vulnerabilities that other cryptocurrencies wouldn’t be able to detect, the team have put together a blog post here to explain.

On whether Archain can be abused

No matter what the application, it’s my experience that most tools suffer abuse when they reach a certain scale. As I looked at in my article on trolls, motivated groups of 4channers have filled YouTube with porn and manipulated Google search results. This caused YouTube to hire teams of people tasked with manually filtering videos, and Google to tighten up its algorithm. Archain, however, is practically immune to abuse.

In our system, the miners are the gatekeepers. They decide what gets onto the system, and what’s served from the system — really, how all of it works. We have a system whereby miners can optionally use blacklists for content because you can’t force a miner to go download a whole website just to verify it. It just wouldn’t be fair. We’re not going to provide blacklists, but we are going to support them. The net effect of this is that there’s a sort of decentralized filter — a consensus about what is acceptable on the network.

The question, I learned, is less about whether or not Archain can be abused, and more about how the network is designed to self-regulate and naturally de-incentivize users from abusive behavior.

When people want information removed from the internet, they go and they ask Google or Facebook. Usually what happens is that Google or Facebook just remove the links to it, not the data. In our system, there has to be search nodes that can find the data for you in the network. So, a way to tackle this is to approach the search nodes and ask them not to serve the information to anyone.

On the energy efficiency of cryptocurrencies

Sharing vast amounts of information between systems worldwide uses a lot of electricity. So much so, some cryptocurrency startups are looking into renewable energy like wind farms. Archain’s lightweight architecture, however, eliminates that issue.

As the chain gets really large over many many blocks, people are going to start storing comparatively small amounts of it. The net effect of that is that the difficulty of the hashing competition lowers, which means the effect on the environment lowers, too.

Energy efficiency is a real problem with Bitcoin, which is using a small country’s worth of power. It was Texas-sized a couple of years ago, I wouldn’t even want to think what it is today. These proof of work algorithms are very expensive on electricity, and we’re happy we have a way of getting around it as the network scales.

You offset some of the difficult work from hashing by using storage, which is very low energy. You could even do it on magnetic tape, if you really wanted to.

Hypothetically (in a distant dreamland, really), any decentralized network can become centralized if one entity controls a disproportionate amount of it. As mentioned before, it would take several countries combined to do this. Even still, the network even has in-built protection from becoming centralized because the financial incentive for mining data that is already well-represented is low.

Our incentivized system means that if storage ever starts becoming centralized, your incentive as an individual is to break off on your own and mine rare data. In cases like that, you’d be competing in an extremely small pool of miners, and have a very high likelihood of getting the reward. There’s a self-organizing layer to the network where everyone is incentivized to store blocks that few people have.

The incentives are organized so that centralization is unprofitable. If things become centralized, then the incentive for people to decentralize it again becomes greater. It’s a pure financial incentive. We call it incentive engineering. It’s the best case for everybody — all network participants behave as selfishly as they can, and it still has the best result for the network over time.

Space landscape-obsessed dreck penman. Appears on TechCrunch, The Next Web, and on Secret Cave in a far less restrained capacity.