Everlink Architecture Document
Draft - v0.1 - last revised on 21 june 2004This is a draft; many things are incomplete and/or missing. It will be improved over time through questions you may have. Don't hesitate to contact me. See the contact section for email.
Everlink Core
Everlink (EVL) is the more or less temporary name of the protocol behind Nodezilla. As you will further read, you will understand that EVL is NOT a file sharing protocol.The goal of EVL is to build an overlay network (Grid) built on top of nodes running on untrusted machines. This grid does nothing alone; instead applications can use it through the EVL API to build features on top of it. File Sharing is one of these applications, search through grid's objects is another. This grid will be able to support and facilitate deterministic object locations, caching, routing, fault tolerance, and strong cryptographic identification. The following document assumes you are familiar with the following concepts:
- Asymmetric-key cryptography, and more particularly RSA
- Digital signatures
- SSL
Let's begin with some vocabulary
Requests Routing
EVL will try to route this request to the node with the closest NID with respect to the request's TargetID. This is done using the routing table, at each hop a node is able to route the request to another node which will have a closest ID, or determine there are no other node with closest ID, and then is the root for this request. Here is a simple sketch on how it works: Node 1234 wants to perform a request with target 4221:
- Computes proximity of 1234 with 4221 => 0
- look up in the level 0 of the routing table what nodes are known beginning with 4 or with the closest first digit
- N1234 selects N4121, and forwards the request to him
- N4121 performs the same step:
- Computes proximity of 4121 with 4221 => 1
- look up in level 1 of the routing table with nodes having a 2 as their second digit or with the closest second digit
- N4121 selects 4211, and forwards the request to him
- N4211 performs the same step:
- Computes proximity of 4211 with 4221 => 2
- look up in level 2 of the routing table with nodes having a 2 as their third digit or with the closest third digit
- There is none, N4211 knows that if his routing table is empty for these criteria, there is no closest running node. It assumes he's the root for this request and performs it.
Object publication & retrieving
- Soft republish: as objects expire after a certain time, the application owning them needs to republish them on a regular basis to keep them available. This will effectively place a copy of the object in the new root as the publish request will be routed to it.
- Hard republish: when a node sees a new node coming in and that node is determined to be root for some objects (by local routing table), they are copied to the new node.
It is also possible for an application to specify that objects may be automatically republished by nodes that are caching them (this is an unauthoritative publish) and will keep objects available even if the original publisher disappear.
This is not a way to make them life-persistent, but could be quite close if the grid is big.
Objects versioning
In case of signed objects, an update is only possible if the new comer is signed by the same node that signed the old object, enforcing the 'owner' property of a signed object.
Objects lifetime and persistence
Additionally they can survive a node restart by being saved on hard disk on node stops, and reloaded on node starts. This is called persistence. They can also be refreshed by other nodes that cache them, this is unauthoritative publish.
These parameters are application dependant.
Load Balancing
So, it's very likely that "popular" objects (i.e.: requested by many nodes) get cached, and then answered by nodes that are not the root of the request effectively reducing loads onto this node. The more an object is requested, the more it is cached and the more the 'circle' of caching nodes around the root for this object expands.
Security considerations
- A NID is not falsifiable: it is not possible for Charlie to have his node use the same ID as Alice's node. This is because Charlie can't have the private key of Alice's node, and then can't derive the public key that will in turn give Alice's ID.
- A NID is checkable for authenticity: it is very easy for Bob's node to check that Alice's node ID has been generated through the right algorithm.
During the SSL handshake between two nodes, Alice's node sends HER public key through a certificate, Bob then computes Alice's NodeID from this
public key, and checks that Alice is not trying to forge a nid.
Alice can do the same to check that she is in fact really talking to Bob.
This algorithm effectively prevents a 'black hole' attack, where Charlie doesn't want anyone to retrieve some objects published on the Grid. It could run a node with an ID very close to the object's ID of which he wants to prevent access. If Charlie succeeds to do that, then all traffic targeted at the object will end on Charlie's node, making the real object unavailable to others.
This attack is not possible, as it is not possible for Charlie to choose his NodeID, as it would mean that Charlie is able to reverse the hash function applied to the public key, and then chooses a secret key with the corresponding public key. This is assumed to be computationally impossible for some years. - Objects that need special care can be signed by a node, and then will only be modifiable by it. Moreover, for very important objects, EVL can use his internal PKI to check that a given signature is issued by a trusted node.
A sample application using EVL - FileShare1.0
As stated before, the EVL protocol on its own does nothing; it just provides primitives taking advantage of the EVL grid in order to build applications on top of it. One of the test services built on top of EVL is File Sharing through the EVLFileShare1.0 service (FSS). Other currently implemented services are:- Metadata search service: extract objects that match some search criteria provided a meta data extractor exist for them
- Shared Document update service: enable nodes to update in a semi-secure way the same object, can be seen as a shared notepad
The file summary object
The IDs of these objects are setup by the FSS to be equal to the SHA-1 hash of the content of the file (i.e.: two identical files with different names will have the same ID).
These objects are the only ones handled by the FSS EVL handler, they are all digitally signed.
Forward Error Correction
The FSS use this to tolerate nodes that originally shared a file and are not there anymore, allowing the file to be retrievable from the grid. FSS considers files as a set of equally sized blocks (i.e.: a 1MB file is a set of 1000 1KB blocks). When a file is requested for the first time on the grid, requests finally arrive (see below) to the sharer. It answers to block requests, and also generates additional redundancy blocks:
For instance, a 1MB file will be sliced into 100 10KB blocks, and the node will compute additional 50 10KB blocks using FEC algorithm. This leads
to a total of 150 10KB blocks that will be scattered throughout nodes (see below). On these 150 blocks, only 100 of them (whatever they are) are needed
to reconstruct the original set of data. So the downloader doesn't have to download more data than the size of the original file, even if more data is
available.
This directly means that up to 33% of final blocks (50 among 150 blocks) may be unavailable (=nodes offline) and the file still being recovered.
How files are downloaded
The first thing is to retrieve the object corresponding, this is issued through the EVL API by a GET request for the EVLFileSummaryDirectoryObject class object. Once this object is retrieved, the FSS knows how much blocks compose the data set (both normal blocks and redundancy blocks), and can start making GET_DATA request on the grid. These requests are applications-handled by the FSS EVL handler.
The GET_DATA is targeted to an id which is derived from the FileID and block number of the current data block, leading to a unique 40 digit identifier. Each block has then a different BlockID which are not related to each others.
When a node finds it is the root for such IDs, it let the FSS EVL handler process the request.
This handler will in turn forward the data block request to the owner node id. This is a multi-tier approach that doesn't let the sharer or the downloader know of each-other (they may or may not have TCP links between them, but there will always have dozens of others connections streaming the same file from dozens of different nodes). Once enough blocks (determined by the FEC algorithm) are successfully retrieved, the file is reconstructed.
Throughout this process, all nodes on the way of the downloader to the sharer may or may not cache data blocks they've seen, the locality property of the routing algorithm will probably lead two requests for the same data block to go by a common path. So, nodes that don't know about the whole file, may still serve data blocks for it.
How anonymity is achieved
A------->T1------->T2------->R------>W1------->W2-------->BThis sketch voluntary involves a lot of nodes. "A" requests a data block, its request is targeted at BlockID=F(FileID,BlockNumber). This request travels though T1 and T2 as explained in the routing section, this because A doesn't know R, nor T2 but knows T1, to finally arrive at the root for BlockID, R.
R then forwards the requests to the owner (recorded in EVLFileSummaryDirectoryObject object), B, through W1 and W2.
B answers back with the data block which follows the same path back to A. On the way, the answer is cached through all nodes, leading to this scenario on request for same block by another node:
C------->Z1------->Z2------->RWhere C doesn't know T1 and T2 but Z1 and Z2, which in turns finally goes to the same R, which already have the requested data in cache, and answers directly. This process is iterated for all blocks of a file, it means that for a 100 block file, there are 100 potentially different R, leading to anonymity, because the nodes that answer data requests don't have the file on their drives, just parts of it. Moreover through FEC (remember a 100 block file, is turned into 150 blocks) and the scattering of blocks throughout the grid due to the computation of BlockID through F(), it is possible to still download a file if B is gone, as long as there are still enough blocks cached on the grid.