So you want to start coding blockchain? … IPFS & OrbitDB [4.2]

This is the second part of a “Hello World” tutorial of IPFS (Inteplanetary File System) & OrbitDB, peer-to-peer decentralised database. The first part is the actual JavaScript code itself. This part covers the difference with the legacy database paradigm and how to design a data system using IPFS and OrbitDB. It is of interest to Managers and System Architects.

The big picture

  1. 1974: Ethernet and the IP protocol introduced peer-to-peer telecommunication. There is no central server like with “Token Ring”. All hosts on a network are peers and can equally transmit.
  2. 2013: Ethereum introduced smart contracts for P2P logic processing. All peers achieve consensus on the same process applied on the same data, the “hard way” to prevent double spending.
  3. 2014: IPFS popularized P2P storage. IPFS does no validation of stored data, it only applies cryptography to track access control and data ownership.

Together, the 3 technologies make possible the existence of truly decentralized organizations. This brings the promise of resilience, low cost of operations and increased traceability. Potentially, we can do without huge centralised platforms that show little social fairness.

In this tutorial we focus on IPFS, how it makes decentralized storage possible in a way that is consistent with decentralized telecommunications and decentralized logic processing.

How does IPFS work?

Whenever a file, a folder, a message or a stream of data is submitted to IPFS, IPFS splits it onto binary blobs of 256 KB that are replicated independently among peers. It is agnostic of the total size and nature of the data, be it integer, floating number, string, array, list etc. It uses IPLD (Interplanetary Linked Data) to build back the original file, folder, stream structure from the blobs. A data structure is fully defined by its IPFS “address” (a “CID”) which does not specify a location on the network, but rather a description of the content itself and its encoding protocol.

Compare this similarity with the IP packets, that have headers that the IP protocol uses to reconstruct the original message.

When the data of a blob changes, a new blob is created and a new IPLD structure is created to describe the new structure (for example a file). The old blob remains because it might remain still another IPLD description that uses this old blob for another structure (linked list for example).

The binary blobs are stored locally as files in a special folder on the local storage. Whenever the computer hosting the IPFS daemon is connected and becomes a node, IPFS announces itself and listens to peer announcements. peers replicate with other peers their blobs and IPLD data. A peer serves the requested blobs and forwards to other peers the requests that it cannot satisfy.

How does OrbitDB work?

OrbitDB is a software that maintains on IPFS a special kind of data structure called a “database”. A database has descriptors with which OrbitDB makes sure that several nodes can do simultaneously changes on their copy of this same database that remain consistent. The database uses a structure called a CRDT (Conflict-free Replicated Data Type). In the version of OrbitDB of July 2021, there are 5 types of “databases”:

  • log: an immutable (append-only) log with traversable history. Useful for “latest N” use cases or as a message queue.
  • feed: a mutable log with traversable history. Entries can be added and removed. Useful for “shopping cart” type of use cases, or for example as a feed of blog posts or “tweets”.
  • keyvalue: a key-value database just like your favourite key-value database.
  • docs: a document database to which JSON documents can be stored and indexed by a specified key. Useful for building search indices or version controlling documents and data.
  • counter: Useful for counting events separate from log/feed data.

Quick summary

How can I design for IPFS & OrbitDB?

On legacy systems, a database can be a huge collection of structured data, organized for easy traversal and retrieval, but not meant to be replicated and updated independently. If a replication takes place, the whole legacy database, or a significant portion of it, is exchanged over the network.

Comparing an OrbitDB database of type keyvalue with a table of a legacy relational database, each OrbitDB database corresponds to a legacy “row” and each OrbitDB keyvalue entry correspond to a cell of a column. Each “row” can be replicated separately, consuming less bandwidth. The “cell” (keyvalue entry) can be modified, more keyvalue entries can be added. The database, in the OrbitDB sense, maintained in coherency across the peers.

Caveat: Terms overloading

As we have just seen, the most difficult thing for a newcomer to IPFS and OrbitDB is to get used to the overloading of terms: a node may have several meanings depending on the context; a same concept of node may be also called client; a hash maybe be indeed a computed value or be a JSON or a multihash or a CID; a CID may be a hash encoded in base58 in IPFS v0 or be encoded in base32 in IPFS v1 and become a multihash used in the address, which is also called a CID, and more…

Don’t despair, take note when learning and you’ll get used to it. It’s so obviously clear 😂. If you know the context.

One precious learning resource is the ProtoSchool web site and its tutorials:

Machu Picchu, the Internet, the blockchain and OrbitDB

All organizations that support the persons-in-need worldwide complain that their data concerning these persons are isolated, inconsistent, not up to date and expensive to maintain. Ideally it should be accurate and shared among all helper organizations, and GDPR-compliant. See White Paper of Mercy Corp, Danish Red Cross and Hive Online.

  • Machu Picchu allows each person-in-need to own and maintain their personal profile data.
  • The profile data are available to all organizations, humanitarian or commercial, that use them to optimize their assistance programmes. They will pay a micro-fee (a few tokens) to the owners to read these data.
  • IPFS (Interplanetary File System) and OrbitDB are decentralised storage solutions that have no data validation constraints. They just store the data inexpensively. They use only cryptography to authenticate the owner (the person in need) and the consumer (assistance organizations) of the data. Let’s store the bulk of data on IPFS and store on the blockchain only the public keys and the addresses of the smart contract that a person-in-need may invoke. This will make affordable the use of blockchain.

Today Machu Picchu exists as a game of Pepito disguises. Pepito is a Caribbean corsair, famous for his disguises.

It is important now to match the development with the field reality. Machu Picchu is looking for a humanitarian organization to concretize an MVP and deploy in reality.

Machu Picchu — Data as a Public Service