Feedback wanted: Collections Design


#1

In the Blockstack dev tools roadmap posted on this forum a few months ago, collections was identified as one of the most important upgrades to the platform. So the team at Blockstack PBC spent the last month working on a design proposal. We wanted to make sure that the result meets the requirements of end-users, developers as well as the ecosystem. We would love to get more input from the community.

Overview

Collections is a way to store common user data in a known location with a known structure. This allows different apps on Blockstack to access and write to the same collection of data. This allows users to use the same data in different apps. An example is a single store of photos owned by a user that could be read and shared by many different apps with permission.

Goals & Design Considerations

The goal is to realize true data portability on Blockstack. In the existing implementation, app data is stored in separate app-specific buckets on Gaia and structured differently. It is difficult to take your own data and use it in another app.

For end-users, we want:

  • True data portability without cumbersome UX.
  • Ability to easily manage app permissions and level of access to collection data.
  • Reduced damage that faulty/malicious apps can cause to the user’s data.

For developers:

  • Great developer experience to incentivize usage of collections over proprietary data formats.
  • Make it easy to utilize user data generated from other apps.
  • Have a voice in the governance of collection data schemas.
  • Ability to extend the vanilla collection data schemas without affecting other apps.
    • We don’t want to stifle developer creativity with rigid data schemas.
    • We don’t want developers to fork away from common data formats.

For the ecosystem:

  • Have some form of governance to add and improve collection data types.
  • A way to incentivize and reward developers that use collections.

Summary of Design

Blockstack will build a library that provides defined classes for commonly used data schemas. Developers will work with these classes and objects instead of creating new data schemas. These objects will automatically convert to the defined data schemas when stored to Gaia and vice versa.

This library of Blockstack collection classes will be open source and we will put in place a governance process to allow addition of new classes and modification of existing ones. Any community member can propose upgrades to the library via a process similar to the SIP process for the Stacks blockchain.

In this design we made a decision to not validate and enforce the schema of the data written to collections. The rationale is that it’s easier to incentivize usage of collections than to enforce it on an open platform. We also provide users with the ability to roll-back data in case apps make undesirable changes that break compatibility with collections.

To provide users with roll-back capability, we’ve designed the collections data store conceptually as an event log. In version 1.0 of collections, every data write apps make will be stored as a separate file. This ensures that data is never lost and users can return files back to any previous state. Potential storage scalability issues will be addressed via compression and limiting history.

We will provide the users with full control over their collections data through the Blockstack Browser. Apps must request access to specific collections during authentication. Users can manage app permissions for collections and explore their raw collections data through the Browser. A file manager in the Browser is needed so the user can explore files and roll back if necessary.

Specification

Collections API in blockstack.js

The collections API will include a set of additional storage functions made available to developers. Under the hood, these new collection functions will use the existing storage functions in blockstack.js.

Storage

Instead of dealing directly with JSON data like the existing storage functions, collection storage will use data model objects. We will create a set of classes that represent the collection data types we want. (e.g. Contacts, Photos, Documents) These classes will internally convert between the objects and JSON for Gaia. Each object will map to individual files in storage. These classes can be extended by developers for additional properties with the requirement that they namespace their additions. We should also build in a versioning system for the schemas to help with compatibility as the schemas evolve.

We need to create a governance process to allow for the community to request changes to schemas and propose new ones. This should be something similar to the SIP process. We will add new schemas to collections if there is enough support for it in the community.

We won’t explicitly validate schemas, the objects themselves will handle the object to JSON schema conversions under the hood. This prevents developers from accidentally breaking the schema. It won’t completely prevent them from changing the data schema but there isn’t really anything we can do to 100% stop them. We can instead offer a way for users to roll back changes through the browser.

Usage example for the proposed API:

import { Collections } from 'blockstack'

const { Contact, Document } = Collections

// Saving a collection item
// Contacts example
var contactName = 'Blocky Stackerson'
var contactEmail = 'blocky@blockstack.com'
var contact = new Contact(contactName, contactEmail)

contactID = putCollectionItem(contact)
// Returns a unique contact ID for use in retrieval later
// contactID = abc12345

// Saving a collection item
// Documents example
var documentName = 'New document'
var documentMarkup = { ... }
var document = new Document(documentName, documentMarkup)
putCollectionItem(document)

// Retrieving a single item using the item ID
var itemID = 'xyz12345'
getCollectionItem(itemID, Document.type)

// List items in the Collection, returns a count of items in the collection
listCollection(Contact, callback)

// the callback is called for each file
callback(contactID) {
  // Fetch the actual contact object
  getCollectionItem(contactID).then((contact) => {
    // Do something with the returned contact object
  })
}

// Delete a collection item
deleteCollectionItem(itemID, Document.type)
// This should just rename the latest file to a historical file and 
// update the index to reflect this

Scope Request

For app developers, collections permissions can be requested via the authentication scope system. The collection schema libraries should provide collection scope identifier constants.

import { Collections } from 'blockstack'
const { Contact, Document } = Collections

const collectionScopes = [
  Contact.scope.read,
  Document.scope.write
]

const appConfig = new AppConfig(
                        [...DEFAULT_SCOPE.slice(), collectionScopes], // scopes
                        'http://localhost:3000', // appDomain
                        window.location.origin, // redirectPath
                        '/manifest.json', // manifestPath
                        null, // coreNode
                        DEFAULT_BLOCKSTACK_HOST // authenticatorURL
                      )
                     
const userSession = new UserSession(appConfig)

userSession.redirectToSignIn()

Requesting scope after authentication

We should optionally provide a function to request additional scopes after the user is already authenticated. For existing apps that want to add collections, the alternative would be to force the user to re-authenticate.

userSession.requestCollectionScope(Contact.scope.write)

Browser-side changes

Storage Key Generation

Collection data would be stored in separate Gaia buckets not related to any apps.

Currently app data is stored in Gaia buckets:

"http://myapp.com": "https://gaia.blockstack.org/hub/143tnkzivRBSSvmyo1bXghoap2gRVpyvzz/

Collections data would be stored in similar buckets:

"collections.contacts": "https://gaia.blockstack.org/hub/143tnkzivFVgPqerPKUoKLdyvgyYNPjM9/

The app data bucket address 143tnkzivRBSSvmyo1bXghoap2gRVpyvzz is generated by deriving from the appsNodeKey in each identity address using a hash of the app domain as the index. We can similarly generate collections data bucket addresses using a collectionsNodeKey and the collection name as the index.

// Key derivation for app buckets
var appDomain = 'https://www.myBlockstackApp.com'
var hashAppIndex = sha256(appDomain + salt)
var appNode = this.hdNode.deriveHardened(hashAppIndex)

// Key derivation for collections bucket
var collectionsPrefix = 'collections'
var collectionName = collectionsPrefix + 'contacts'
var hashCollectionIndex = sha256(collectionName + salt)
var appNode = this.hdNode.deriveHardened(hashCollectionIndex)

We prefix collection index with collections to avoid collisions between app and collection indices.

Encryption Key Generation

We can derive encryption keys for collections similar to how we derive the bucket keys. In this case the index we’re using contains a hash of the list of apps authorized to the collection. This way we can revoke encryption keys by removing the app from the authorized list.

// Encryption key derivation for collections bucket
var collectionName = 'collections.contacts'
var authorizedApps = ['https://myApp.com', 'https://otherApp.com']
var authorizedAppsHash = sha256(authorizedApps.toString())
var hashCollectionIndex = sha256(collectionName + authorizedAppsHash + salt)
var appNode = this.hdNode.deriveHardened(hashCollectionIndex)

The user’s profile.json should keep track of the list of apps that has been authorized for each collection. This data should be encrypted.

Example profile.json:

...
// User apps
apps: {
  "https://MyApp.com": "https://gaia.blockstack.org/hub/1CDUqlkjQgYNt342kjeD4fd83aiNGQ22a/",
  "https://OtherApp.com": "https://gaia.blockstack.org/hub/1JL1fjQrh238S9aMn3skS3aiNGLN32g23ab/",
},
// User collections
collections: {
  "documents": {
    "location": "https://gaia.blockstack.org/hub/1Lsdf83isMHFsfse223hrbEynNR63vn2A/",
    "authorizedApps": 
    // Encrypted section
    [
      "https://MyApp.com",
      "https://OtherApp.com"
    ]
    // End encrypted section
  }
}
...

Encryption Key Storage

We can store the encryption keys for collections in the app’s own storage bucket and encrypt the key with the app private key. When the app needs to decrypt data from a collection, it should fetch and decrypt the key from it’s own storage bucket. This way when the encryption keys change, no action is required from the app.

Proposed naming convention for encryption keys on app data buckets:

.<collection-name>.collection.key

Examples:

.photos.collection.key

.documents.collection.key

Note that the filenames should be encrypted.

Encryption Key Revocation

To revoke an app’s ability to encrypt and decrypt a specific collection’s data, we need to change the encryption key and re-encrypt the existing data.

We can change the encryption key by removing the app from the authorized app list for that collection. And regenerate the key using the new authorized app list hash as the derivation index.

The user will be given a choice to decrypt and re-encrypt all files in the collection, including historical files using the new key. Or only encrypt new file writes using the newly generated key. The current and historical files will not be re-encrypted. A necessary next step would be to update the stored encryption key file in each authorized app’s bucket.

This action would need to be performed from the user’s browser/authenticator since it’s the only agent that can write to every app’s storage bucket as well as the collection.

If the apps cache collection encryption keys locally, they need to know when the encryption key changes. Each encrypted collection write operation should send the encryption key ID to the Gaia hub. The hub will check the ID against the stored key file in the bucket and return an error in case of mismatch. The client-side logic would be to automatically fetch the new key, re-encrypt and perform the write again.

Gaia Hub Changes

The Gaia hub should allow a new type of authentication token that only supports a special write operation that retains change history. This provides the user the ability to roll back files to a previous state. In version 1.0, we will just keep every file that was written to a collections storage bucket. We store the latest version of the file with the canonical name so that file reads don’t need to query an index or log.

Example:

myphoto1.jpg <----- Always the latest version
.history.1003.myphoto1.jpg <----- Previous version
.history.1002.myphoto1.jpg
.history.1001.myphoto1.jpg

Naming scheme for historical files is .history.<number>.<filename>

On file writes, the Gaia hub would simply rename the last version of the file to the historical file naming scheme. The naming scheme includes an incrementing number so we can order the files later. The index file provides the current max number for each file. And the Gaia hub will need to be able to deny writes to files using the historical file naming scheme so that apps cannot overwrite historical files. When the user wants to roll back a file, we can construct the full history of each file using the historical files and the number in the filename.

Index file

The Gaia hub will auto manage the collection index file. The index file contains a list of all files stored in the collection and the current value of the incrementing number for each file. On each file write, the hub should check the index file. It should add a new entry if the file does not exist. If the file already exists, the hub should increment the file version number.

File manager

The collections implementation should include a file manager that can allows users to browse their collections data and potentially regular app data. The only place this can be implemented is the Browser since it can generate storage and encryption keys for all collections/app buckets.

App Permission Manager

A simple interface is required to manage app permissions for collections. The user should be able to view the list of apps that have access to each collection type. It should also be possible to revoke app’s access to collections from here.


2019-04-03 Engineering Meeting (Open to Public)
#2

I’m glad to see this sort of concept is being worked on! I was musing on the same sort of idea, and surprised to see this is a current topic!

I like the idea overall, but one modification I would propose is to not have it be baked into the protocol itself the bind between the collection data type and the name of the Collection (e.g. as currently proposed, all Contact records go in a collections.contacts bucket). I think it’s a good idea to have a standard set of schemas, but I think it would be better to give the user control over creating the Collection buckets themselves, and choosing what types of things go in it. For example, I may not want thatsketchyapp.xyz to have access to all my Photos, only a few that I’m testing with (since I don’t trust the app developers of that app to not maliciously delete my Photos out from under me).

The key concern in that is if apps that are granted access to a bucket have the ability to delete files out of the bucket, that runs the risk of encouraging users to keep all their data “in one basket” (your One Main Contact list) and an accidental or malicious app could delete your one copy of your data. If they only have edit-access, a malicious app could rewrite all my Contact records to have the name “John Smith”, but with the History naming scheme presented here, there’d be the possibility of undoing such an action (though might be very tedious to roll back hundreds of edits by one app. Might be useful to note with the history changes which app was the one that made the change. That would allow automated actions like “reverse all changes made by sketchyapp.xyz in the last 3 hours”).

For Schemas, having a master list of “root” schemas that then developers can extend with custom (“namespaced”) properties sounds a lot like the Resource Description Framework (RDF)/Semantic Web/Linked Data idea. Perhaps since the serialized format for these objects is JSON, the JSON-LD structure can be used, and existing schemas/contexts (like a Person, instead of a Contact) could be used?


#3

The description of the gaia storage sound like Collection permissions in 1.0 are always read/write, correct? While the scopes are read or write in the code snippet. It would be nice to have read-only permission in 2.0 maybe.

On the browser-side changes, there needs to be also an update on the permission dialog to show/explain the requested collection permissions. It is not clear how custom collections could be created. Is it just to request a new scope? Do new collections can only exist with a browser and blockstack.js update? That would prevent innovation, in particular if the long term goal is that all data is stored in collections. There should be a fallback if the collection is not known to the browser.


#4

I like this proposal a lot. I think the sooner this can be rolled out, the better. 3Box started behind Blockstack and is now ahead of Blockstack in terms of using data across multiple apps, so it’s important to push this forward quickly IMO.


#5

The issue I see with letting people create any custom collection they want is that apps won’t know about it. If my documents app doesn’t know I have a custom “super sensitive documents” collection, then they won’t be able to ask for access permissions. However I think it would make sense for people to be able to create multiple versions of a single collection type. So that when apps ask for “documents” collection permissions, the user can choose one of their several documents collections to use in the app.

Yes, I think we want to simplify as much as possible for version 1. This feature sounds like a great upgrade for a later version.

The serialized formats should be JSON and we’re most likely going to use existing standard schemas such as Person.


#6

Read-only permissions are going to be in version 1.

New collection types should be added to the Blockstack collection schema library via pull request. The Browser should be able to gracefully handle unknown collection types. Since all the Browser will do to enable collections is generate the keys, this should be very doable.


#7

I think this could work similarly to how sites like Google Drive handle permission requests. When I am logged into several Google accounts, and on a third-party site, click a “attach document from Google Drive” link, my browser first redirects to Google and I get a Google-created “which account do you want to give Drive access to?” prompt. After that I get a filepicker and rights are granted to the third-party app.

So, with that idea, a Blockstack app doesn’t need to know all a user’s collections. They’d instead show the user some a “Import Collection” button (or similar), which when clicked would open a Blockstack browser dialog with all the user’s collections (similar to how the login prompt, if you have multiple IDs registered, gives you a prompt for which ID to pass along to the app), for the user to tick off which one(s) they want to give the app access to.


#8

What would be interesting is if Apps / the User could self-define collections using a “manifest.json” of sorts.

For instance, a collection titled mycollection could be created by myapp1 and could be used as a cross-app bucket for communicating with myapp2, with whatever spec myapp1 defines. Though this would make more sense for a later versions of collections (along with the changefeed subscriptions!).

Excited to see where this goes, and I’m glad to see that people can at least make different collections of the same type – it’ll be interesting to see if you can sync files between collections as well (partial/full-sync buckets I suppose).


#9

Forgive if this is a naive question, but I assume ‘the browser’ wont be the only potential data manager? Is a ‘manage collections’ permission an option?

Couldn’t we solve this by just making collection types an array?

collections: {
  "documents": [
{
    "location": "https://gaia.blockstack.org/hub/1Lsdf83isMHFsfse223hrbEynNR63vn2A/",
    "authorizedApps": 
    // Encrypted section
    [
      "https://MyApp.com",
      "https://OtherApp.com"
    ]
    // End encrypted section
  },
{
    "location": "https://gaia.blockstack.org/hub/completelydifferentgaiaurl/",
    "authorizedApps": 
    // Encrypted section
    [
      "https://MyApp.com",
      "https://OtherApp.com"
    ]
    // End encrypted section
  }
]
}

Other than that question, multiple +1s. Looking forward to this.


#10

The Blockstack Browser is the only collections manager for now. The reason is that in order to manage collections you need the master private key to generate and revoke encryption keys for apps. However a future third-party implementation of the Browser/authenticator can also perform this.

We have to also consider that it’s not just knowing about the collections but also the schema of the data.


#11

Sounds solid! Looking forward to playing around with this!