Projects‎ > ‎

Cloud Node Project

For this final project, you will implement one node in a storage cloud.  Your node will interoperate with your classmates' nodes to create a scalable storage cloud.

In order for everyone's nodes to work together, each node must follow a strict protocol which is detailed here.  If you implement something incorrectly, you will confuse other nodes in the cloud!

Cloud Components

Each student will write three separate programs to run on their VMs:
  1. a key-object storage service, running on port 8006
  2. a request multiplexer, running on port 8007
  3. a search service, running on port 8008

The Key-Object Store

Each storage node should implement a simple REST API to store and retrieve JSON objects.

GET /

If a client sends a GET request to your node but doesn't specify a bucket, your node should respond with a JSON array of all buckets IDs stored on your node, for example:

[ "mybucket", "yourbucket", "otherbucket" ]

GET /{bucket}/

Return a JSON-array of all object IDs in the bucket.  If there is no such bucket, return an empty array.

An example request:

GET http://virtual40.cs.missouri.edu:8006/RyBucket/ HTTP/1.1

and response:

[ "Ryanne", "Sarah", "Geddy", "Xanadu" ]

GET /{bucket}/{object-id}

Object-IDs can be any URL-encoded string, including a long "path" like folder/subfolder/file.  The node should respond with the requested JSON object, if it exists.

E.g.

{ "Name" : "Ryanne", "Age" : 27 }

Status codes:
  • 200 OK
  • 404 Not Found

PUT /{bucket}/{object-id}

Accept a JSON object and store it in the bucket.
  • 200 OK: return this if the resource was already there and is just being updated
  • 201 Created: return this when the resource did not exist before now

The Request MUX

Your request mux (or request router) will reverse-proxy incoming requests to the corresponding storage node (which might not be the one you wrote!).  In order to determine which node to connect to, the mux looks at the entire request path, including the leading slash, and applies the Adler32 checksum to the string.  This results in a 32-bit integer, which must be mod'd with the number of VMs (let's say 40 for now).  Since our VMs start counting at one, add +1 to the result:

n := adler32.Checksum([]byte(req.Path)) % 40 + 1

This is the VM that should store the corresponding request path.

Note: when testing your own node and mux, you should temporarily proxy all requests to your own storage node, regardless of the request path:

// n := adler32.Checksum([]byte(req.Path)) % 40 + 1
// use my vm for now:
n := 37

The Search Service

Your third program will enable the entire cloud to be searched for objects in a given bucket.  You just need one request handler:

GET /{bucket}/

Return a JSON array of object IDs corresponding to all objects in the given bucket on ALL NODES.  You should be able to figure out how to do this!

Other Requirements

Your node doesn't necessarily need to persist resources to a database.  It is up to you how you handle persistence.  You may choose to use a database or just a hash table in memory, for example.  If your node does not sync to a database, then it is possible it will lose all its buckets when it is restarted.  This is okay.  The protocol should handle this loss of information gracefully.

When your node is complete, keep it running on your virtual machine.  It will become part of the cloud along with other completed nodes.

Comments