How to run Gaia with local filesystem storage

#1

I’ve had this question asked a few times as well. How do you run a Gaia hub that stores data to the local filesystem, and not to a cloud? To do so, you’ll need to do the following.

WARNINGS

This should be taken as a draft of documentation. Please report bugs in this forum thread.

Step 1: Install Gaia and Gaia Reader

You can install the Gaia hub from source using the documentation in its README.

You will want to additionally install the “Gaia reader.” This is a program that is specifically designed to allow you to read data stored to the local filesystem while preserving the MIME type. You can get it up and running by following the instructions on its README.

Step 2: Configure Gaia

You will want to use the disk driver. Here is a sample config file for doing so. This config will store the data under /tmp/gaia-disk, will accept writes on port 4000, and will advertise its read endpoint (readURL) as https://my-gaia-hub.com/. Keep this value in mind, since it will influence how we will configure the Gaia reader.

{
  "servername": "localhost",
  "port": 4000,
  "driver": "disk",
  "readURL": "https://my-gaia-hub.com/",
  "proofsConfig": {
    "proofsRequired" : 0
  },
  "diskSettings": {
     "storageRootDirectory": "/tmp/gaia-disk"
  },
  "pageSize": 20,
  "argsTransport": {
    "level": "debug",
    "handleExceptions": true,
    "stringify": true,
    "timestamp": true,
    "colorize": false,
    "json": true
  }
}

The Gaia reader’s config will look like this:

{
   "port": 5000,
   "diskSettings": {
      "storageRootDirectory": "/tmp/gaia-disk"
   }
}

Note that the port is 5000 here. Also, notice that the diskSettings.storageRootDirectory matches the same value in the hub’s configuration file (/tmp/gaia-disk). This will need to be the case if you make changes to where the data is stored.

Step 3: Configure an Nginx proxy

The Gaia hub reader will handle HTTP GET requests, and return file data stored to /tmp/gaia-disk with the appropriate MIME type. However, the Gaia reader is just a node.js server, and it doesn’t speak HTTPS. We’ll need to set up an Nginx proxy to handle that.

To do so, you’ll want to install Nginx, and make Nginx handle reads through the Gaia reader. Nginx can’t just serve data out of /tmp/gaia-disk on its own, since it doesn’t know the MIME type to serve. Instead, we’ll configure Nginx to use the Gaia reader as a back-end as follows:

server {
   # NOTE: SSL configuration details are omitted
   server_name gaia_proxy;
   listen 443;
   listen [::]:443;
   root /var/www/html;
   index index.html;
   
   # forward requests for data to the reader
   location ~ ^/hub/.+ {
      proxy_pass http://localhost:5000;
   }

   # forward requests for / and /hub_info to the Gaia hub
   location /hub_info {
      proxy_pass http://localhost:4000;
   }
   location / {
      proxy_pass http://localhost:4000;
   }
}

The proxy_pass directives send data to and from the Gaia hub and Gaia reader.

Putting it all Together

The Nginx server will serve GET requests for https://my-gaia-hub.com/hub/$FILEPATH, which it will pass to the Gaia reader on http://localhost:5000. The Gaia reader will load data from /tmp/gaia-disk/$FILEPATH and reply to the GET request with the appropriate MIME type. Remote clients will write to the Gaia hub by POST-ing writes to /, which will be sent to the Gaia hub. Remote clients will connect to the Gaia hub by GET-ing /hub_info, which will be forwarded to the Gaia hub as well.

#2

Hi @Jude, many thanks for this piece.

#3

@markmhendrickson I think we made a conscious decision not to include this in the external docs. Any problems with me exposing it outside the repo as it is public on the forum?

#4

I don’t see any problem with exposing this publicly. In fact, I think this would be great to document fully and publicly, since I’m keen on seeing how people use and extend local filesystem storage for Gaia.

Separately, I hadn’t heard of our Gaia reader before and I’m glad to see something like it exits to facilitate local storage. But I’m also a bit puzzled why three servers in total are needed to pull this off (Nginx + Gaia reader + Gaia hub).

Given that Nginx routes to both of the others, would it not simplify things to somehow route requests to Gaia reader through Gaia hub instead? My naive first impression is that it feels as though the latter is doing some of the work of the former and this could all be simplified somehow to reduce the need to configure and maintain three servers.

#5

Gaia needs to preserve the MIME type of a file, in addition to the file data itself. If I upload something that is application/json, then it must be application/json on read.

A filesystem does not preserve this information by itself, and nginx by itself has no way of determining the client-given MIME type of a file stored to disk. Therefore, some translation mechanism is needed on the read path to ensure that the client-given MIME type is replied along with the file data. This necessitates a separate Gaia reader “side-car” process.

While the Gaia reader can serve as a stand-alone HTTP server, you’d typically want to pair it with a “real” HTTP server so you can support SSL, rate-limiting, etc.

No. The Gaia hub is strictly a write-only process in order to keep the complexity (and attack surface) down. This is fine in most cases, because most storage backends let the client set a file’s MIME type and will automatically reply it on read. The reader side-car is only necessary for supporting storage back-ends that can’t do this on their own (i.e. the disk driver), and is therefore implemented as an add-on.

#6

I see, I think I overlooked the fact that the Gaia hub is strictly write-only (and that storage providers are generally left to support read requests themselves).

So with the current PBC hub, read requests that appear to be routed through the hub (i.e… starting with https://gaia.blockstack.org/hub/) are actually being routed by Nginx (or a similar proxy server) directly to the backing storage provider for fulfillment?

#7

They’re being routed to Cloudflare, which in turn routes them to PBC’s Azure bucket. The Gaia hub is not on the read path at all.

1 Like
#8

Thanks for your the step by step instructions

#9

Just to note here…the steps that @jude outlined are very similar to how we do this in the AWS image, we simply use docker to run the code.
https://github.com/blockstack/gaia/tree/master/docker

2 Likes