Fast access to user profiles, possible?


#1

Hey,

I’m building a sort of user profile index, so downloading all of them from gaia.blockstack.org/hub as the majority of the profile.json’s are stored there.

This is pretty slow, for every read there’s a ~0.5second waiting time (TTFB). I suspect this is because Blockstack’s node has throttling enabled.

Is there any way around it?

I thought of deploying my own gaia hub - but I don’t think that will help, since user data on S3 are protected by aws credentials that belong to Blockstack, is this correct?


#2

Sounds about right, as far as I can tell – as it’s not the core node you’re having issue with (which data is public), but rather the data stored in Blockstack (PBC)'s AWS storage, which you should only be able to get through their Gaia Hub…

Side question - why do you need to download all of them? Why not just record the address that “this address exists and has a profile” and fetch it when you need to? I would think that’s a better idea for a simple index, but on the other hand if you want to do “big data” tests and whatnot then I could understand wanting it all in one place.


#3

I want to collect daily statistics of the number of app installations on Blockstack (wanna make it public of course). I see now it’ll be a bit challenging since downloading all user profiles takes longer than a day, around 25-30 hours.

Also, I’m a bit puzzled now about Blockstack having more access to user data than anyone else. Sure, you can take your data and move it wherever you want to. But you can also download your all your data from facebook and upload it to other apps if you want. What’s the difference? Just trying understand it better and have a discussion.


#4

Neat idea, and I like it - and can definitely see the issue with the rate limiting.

I think the biggest thing is the line between Blockstack the Idea and Blockstack the Company (i.e. Blockstack PBC). The Idea is that you own your data and can take it anywhere you want - it’s also encrypted and your data storage provider (be it AWS or Google Drive) is treated as dumb storage that isn’t allowed to read it due to encryption (unless it sits unencrypted).

With how the Blockstack’s “official” (Company) Browser is being developed, and how the “official” (Company) Gaia Hub is run, currently the only thing you’re allowed to use is the “official” (Company) AWS storage unless you set up your own Gaia Hub. It’s basically set up like a prototype or proof of concept, though it can come off as slightly hideous:

“We really love the idea of users owning their own data and being able to control it - but through our official avenues, we will own all your data and you have no control over it!”

We - the users - literally have no control over what’s in their AWS storage in our name. We can’t arbitrarily delete our profile.json files or other app data files… It’s really awful from a privacy standpoint, and they can technically run any big-data tests they want (like the one in your idea) but we can’t because of rate limiting and non-democratization of data-storage with better providers.

Now I really want to set up my own Gaia Hub… haha.


#5

Hi @MichaelFedora,

Totally hear where you’re coming from. I very much want to be able to easily deploy my own Gaia hub too.

To make this a reality, I’m in the process of upgrading the authentication protocol to have the authentication program (e.g. the Browser, your extension, the CLI) to pass both the Blockstack Core API endpoint and the Gaia write endpoint to the application via the authResponse JWT. There are two PRs for this:

Once I can create a reliable way to test various versions of blockstack.js against various versions of authResponse tokens (which the comments in that PR describe), we can merge it to master and ship it. That, and we need to merge Gaia’s develop branch to its master branch.

In the mean time, you can already run your own Gaia hub if you use the CLI authenticator. If you install the CLI and npm link the feature/authResposne-1.3 feature branch of blockstack.js into your node_modules, you can use the CLI as an authenticator as follows:

$ blockstack-cli authenticator "YOUR-GAIA-HUB-WRITE-URL" "YOUR 12-WORD SEED"

To deploy your own Gaia hub, you’ll want to run the develop branch. You would set it up so it only takes writes from your Blockstack ID(s). For example, here’s a sample Gaia hub config that stores data to /tmp/gaia-hub using the disk driver, and allows writes only from judecnelson.id (note that my ID-address is ID-15gxXgJyT5tM5A4Cbx99nwccynHYsBouzr):

{
  "servername": "localhost",
  "port": 4001,
  "driver": "disk",
  "readURL": "https://my.example.domain/hub/",
  "whitelist": ["15gxXgJyT5tM5A4Cbx99nwccynHYsBouzr"],
  "proofsConfig": {
    "proofsRequired" : 0
  },
  "diskSettings": {
     "storageRootDirectory": "/tmp/gaia-hub",
  },
  "argsTransport": {
    "level": "debug",
    "handleExceptions": true,
    "stringify": true,
    "timestamp": true,
    "colorize": false,
    "json": true
  }
}

Using this config file (with the readURL changed appropriately to your server), you can run the Gaia hub with blockstack-gaia-hub /path/to/your/config.json.