Everlink Architecture Document

Draft - v0.1 - last revised on 21 june 2004

This is a draft; many things are incomplete and/or missing. It will be improved over time through questions you may have. Don't hesitate to contact me. See the contact section for email.

Everlink Core

Everlink (EVL) is the more or less temporary name of the protocol behind Nodezilla. As you will further read, you will understand that EVL is NOT a file sharing protocol.
The goal of EVL is to build an overlay network (Grid) built on top of nodes running on untrusted machines. This grid does nothing alone; instead applications can use it through the EVL API to build features on top of it. File Sharing is one of these applications, search through grid's objects is another.

This grid will be able to support and facilitate deterministic object locations, caching, routing, fault tolerance, and strong cryptographic identification.

The following document assumes you are familiar with the following concepts:

Asymmetric-key cryptography, and more particularly RSA
Digital signatures
SSL

Let's begin with some vocabulary

NodeID: A node ID (NID) is a 40 digits unique identifier attributed to each node when it's first initialised. This ID will remain for the whole life of the node. It is generated by hashing the public key of the node; leading to an unforgeable ID (it is assumed the private key is not available to the world).

Objects: These are what are handled by the EVL primitives. An object is an instance of a class of objects. The number of classes currently available is fixed. New classes can be added through an SDK for private EVL grids to support custom objects type. Objects have a fixed number of properties used by EVL protocol implementation; other application dependant properties are transported but ignored during objects processing allowing EVL to handle arbitrary objects type. Objects can be digitally signed if requested by the application layer with the private key of the node creating the object. Such signed objects can then only be updated by the original creator node (called the owner).

Object ID: each object is attributed a 40 digits ID. The way this ID is generated is application dependant.

ID proximity: Two IDs are evaluated to be 'close' with respect to each other by evaluating how many digits they have in common from left to right.

Routing table: A node does not know all nodes in the grid (which would be overkill), instead it knows some of them, those who have closest NID. They are organized in a routing table. This routing table allows to route requests to nodes closest to the object ID requested.

Services: Services are application handlers which handle application messages routed through EVL. They are identified by a unique string.

Object store: This is the in-memory database where a node stores all objects it knows about, locally created or replicas.

Owner: For a digitally signed object, the owner is the signer of the object (ie: the node which has the private key used to sign)

Requests Routing

An application issues request on the EVL grid. These requests have a target ID, a service name, and optionally application dependant objects and data chunks attached to them.
EVL will try to route this request to the node with the closest NID with respect to the request's TargetID. This is done using the routing table, at each hop a node is able to route the request to another node which will have a closest ID, or determine there are no other node with closest ID, and then is the root for this request.

Here is a simple sketch on how it works:

Node 1234 wants to perform a request with target 4221:

Computes proximity of 1234 with 4221 => 0
look up in the level 0 of the routing table what nodes are known beginning with 4 or with the closest first digit
N1234 selects N4121, and forwards the request to him
N4121 performs the same step:
Computes proximity of 4121 with 4221 => 1
look up in level 1 of the routing table with nodes having a 2 as their second digit or with the closest second digit
N4121 selects 4211, and forwards the request to him
N4211 performs the same step:
Computes proximity of 4211 with 4221 => 2
look up in level 2 of the routing table with nodes having a 2 as their third digit or with the closest third digit
There is none, N4211 knows that if his routing table is empty for these criteria, there is no closest running node. It assumes he's the root for this request and performs it.

Object publication & retrieving

Object location is performed by following the routing request algorithm exposed before. In this case the target ID of a request is the Object ID. As a consequence, the root node for a given object is the node with the closest ID with respect to the object ID.

Thus, if a new node joins the grid, then it will become root for some objects already published. This problem is handled by two means:

Soft republish: as objects expire after a certain time, the application owning them needs to republish them on a regular basis to keep them available. This will effectively place a copy of the object in the new root as the publish request will be routed to it.
Hard republish: when a node sees a new node coming in and that node is determined to be root for some objects (by local routing table), they are copied to the new node.

These two mechanisms will enforce the deterministic location as, all objects on the grid are always accessible, except for very short period of time during the soft republish delay. Additionally each time a node routes a request for publishing or retrieving an object, it keeps a copy of it in cache.
It is also possible for an application to specify that objects may be automatically republished by nodes that are caching them (this is an unauthoritative publish) and will keep objects available even if the original publisher disappear.
This is not a way to make them life-persistent, but could be quite close if the grid is big.

Objects versioning

Each object has a version number (application dependant), and an application may publish objects with the same ID but with a different version number. In this case, if a node sees a publish for an object with same id than an already known object, but with higher version number, it deletes the old one, and replaces it by the new one.
In case of signed objects, an update is only possible if the new comer is signed by the same node that signed the old object, enforcing the 'owner' property of a signed object.

Objects lifetime and persistence

Objects have a given lifetime, if they are not refreshed during this time, they are discarded.
Additionally they can survive a node restart by being saved on hard disk on node stops, and reloaded on node starts. This is called persistence. They can also be refreshed by other nodes that cache them, this is unauthoritative publish.
These parameters are application dependant.

Load Balancing

As the routing algorithm incrementally sends requests to a closer node of the target ID, and because objects are stored on nodes with id closest to the object IDs, requests are likely to go through the same path as their level of routing (i.e.: number of hops) is getting higher.
So, it's very likely that "popular" objects (i.e.: requested by many nodes) get cached, and then answered by nodes that are not the root of the request effectively reducing loads onto this node. The more an object is requested, the more it is cached and the more the 'circle' of caching nodes around the root for this object expands.

Security considerations

The security of the EVL grid is based on some simple facts:

A NID is not falsifiable: it is not possible for Charlie to have his node use the same ID as Alice's node. This is because Charlie can't have the private key of Alice's node, and then can't derive the public key that will in turn give Alice's ID.
A NID is checkable for authenticity: it is very easy for Bob's node to check that Alice's node ID has been generated through the right algorithm. During the SSL handshake between two nodes, Alice's node sends HER public key through a certificate, Bob then computes Alice's NodeID from this public key, and checks that Alice is not trying to forge a nid.
Alice can do the same to check that she is in fact really talking to Bob.
This algorithm effectively prevents a 'black hole' attack, where Charlie doesn't want anyone to retrieve some objects published on the Grid. It could run a node with an ID very close to the object's ID of which he wants to prevent access. If Charlie succeeds to do that, then all traffic targeted at the object will end on Charlie's node, making the real object unavailable to others.
This attack is not possible, as it is not possible for Charlie to choose his NodeID, as it would mean that Charlie is able to reverse the hash function applied to the public key, and then chooses a secret key with the corresponding public key. This is assumed to be computationally impossible for some years.
Objects that need special care can be signed by a node, and then will only be modifiable by it. Moreover, for very important objects, EVL can use his internal PKI to check that a given signature is issued by a trusted node.

A sample application using EVL - FileShare1.0

As stated before, the EVL protocol on its own does nothing; it just provides primitives taking advantage of the EVL grid in order to build applications on top of it. One of the test services built on top of EVL is File Sharing through the EVLFileShare1.0 service (FSS). Other currently implemented services are:

Metadata search service: extract objects that match some search criteria provided a meta data extractor exist for them
Shared Document update service: enable nodes to update in a semi-secure way the same object, can be seen as a shared notepad

The file summary object

This is the class (id=EVLFileSummaryDirectoryObject) of objects stored throughout the grid. This object contains filename, metadata, FEC codec info, CRCs, owner node id and so on. It does NOT contain the data of the file.
The IDs of these objects are setup by the FSS to be equal to the SHA-1 hash of the content of the file (i.e.: two identical files with different names will have the same ID).
These objects are the only ones handled by the FSS EVL handler, they are all digitally signed.

Forward Error Correction

Forward Error Correction (FEC) is an error control process which creates additional redundancy data from existing data, allowing reconstruction of the original data, even when some of them are not available anymore.
The FSS use this to tolerate nodes that originally shared a file and are not there anymore, allowing the file to be retrievable from the grid.

FSS considers files as a set of equally sized blocks (i.e.: a 1MB file is a set of 1000 1KB blocks). When a file is requested for the first time on the grid, requests finally arrive (see below) to the sharer. It answers to block requests, and also generates additional redundancy blocks:

For instance, a 1MB file will be sliced into 100 10KB blocks, and the node will compute additional 50 10KB blocks using FEC algorithm. This leads to a total of 150 10KB blocks that will be scattered throughout nodes (see below). On these 150 blocks, only 100 of them (whatever they are) are needed to reconstruct the original set of data. So the downloader doesn't have to download more data than the size of the original file, even if more data is available.
This directly means that up to 33% of final blocks (50 among 150 blocks) may be unavailable (=nodes offline) and the file still being recovered.

How files are downloaded

For the FSS to retrieve a file, it must be provided with the corresponding ID. This ID is retrieved either through the MetaData search service, or for instance directly through a magnet link.
The first thing is to retrieve the object corresponding, this is issued through the EVL API by a GET request for the EVLFileSummaryDirectoryObject class object. Once this object is retrieved, the FSS knows how much blocks compose the data set (both normal blocks and redundancy blocks), and can start making GET_DATA request on the grid. These requests are applications-handled by the FSS EVL handler.
The GET_DATA is targeted to an id which is derived from the FileID and block number of the current data block, leading to a unique 40 digit identifier. Each block has then a different BlockID which are not related to each others.
When a node finds it is the root for such IDs, it let the FSS EVL handler process the request.
This handler will in turn forward the data block request to the owner node id. This is a multi-tier approach that doesn't let the sharer or the downloader know of each-other (they may or may not have TCP links between them, but there will always have dozens of others connections streaming the same file from dozens of different nodes).

Once enough blocks (determined by the FEC algorithm) are successfully retrieved, the file is reconstructed.
Throughout this process, all nodes on the way of the downloader to the sharer may or may not cache data blocks they've seen, the locality property of the routing algorithm will probably lead two requests for the same data block to go by a common path. So, nodes that don't know about the whole file, may still serve data blocks for it.

How anonymity is achieved

If we sketch the process of downloading a data block from B by A:

	A------->T1------->T2------->R------>W1------->W2-------->B

This sketch voluntary involves a lot of nodes.

"A" requests a data block, its request is targeted at BlockID=F(FileID,BlockNumber). This request travels though T1 and T2 as explained in the routing section, this because A doesn't know R, nor T2 but knows T1, to finally arrive at the root for BlockID, R.
R then forwards the requests to the owner (recorded in EVLFileSummaryDirectoryObject object), B, through W1 and W2.
B answers back with the data block which follows the same path back to A. On the way, the answer is cached through all nodes, leading to this scenario on request for same block by another node:

	C------->Z1------->Z2------->R

Where C doesn't know T1 and T2 but Z1 and Z2, which in turns finally goes to the same R, which already have the requested data in cache, and answers directly.

This process is iterated for all blocks of a file, it means that for a 100 block file, there are 100 potentially different R, leading to anonymity, because the nodes that answer data requests don't have the file on their drives, just parts of it.

Moreover through FEC (remember a 100 block file, is turned into 150 blocks) and the scattering of blocks throughout the grid due to the computation of BlockID through F(), it is possible to still download a file if B is gone, as long as there are still enough blocks cached on the grid.

Nodezilla Grid Network