IndieWebCamp is a 2-day creator camp focused on growing the independent web

Taproot

Contents

Taproot

Taproot is Barnaby Walters’ publishing software, named on . It’s written in PHP 5.4 and drives most of waterpigs.co.uk. It is not currently released to the public. However, various parts of it have been extracted into libraries:

  • php-mf2 for parsing microformats2 data
  • taproot/archive your personal opinionated indieweb/microformats-oriented HTML archiver
  • original-post-discovery] for determining the canonical URL from POSSE copies
  • mf-cleaner for handling microformats2 array structures better
  • diaspora-export for exporting public posts from Diaspora
  • php-abc for parsing headers out of ABC notation
  • php-helpers: static helper functions, some utility classes and POSSE stuff including the truncenator. Slowly having the useful parts extracted into smaller packages.
  • librarian for DB indexing of flat files, no longer used/maintained
  • php-activitystreams activitystreams implementation, no longer used/maintained

Most of the Taproot design artifacts are up on github too in taproot/design.

All content-creation UIs in taproot are publicly viewable — take a peek!

Design Principles

In addition to the indiewebcamp principles, these are some principles I have discovered through building taproot.

  • Build software which feels safe, all other concerns (privacy, security, technically exciting) are secondary [1] — this is based on personal experience through selfdogfooding taproot
  • Easy things should be easy, remove friction so hard things are only has hard as they need to be [2]
  • Use event listener patterns to build upon the same data in intelligent ways [3]
  • Build reusable, framework-agnostic components and release them to the public, not monolithic products.

IndieWeb Support

Taproot supports the following:

  • All content is marked up using microformats2 so parse at your leisure.
  • Supports posting of notes, articles and music — other post types are faked with HTML inside notes
  • Sends pingbacks and webmentions to all the links on every post
  • Receives webmentions and pingacks and displays them as comments, likes, reposts or mentions
    • Hooked up to Bridgy to display silo interactions in the same place
  • reply notes parse the URL they’re in-reply-to and display a reply-context if microformats are present
  • Notes have webactions with twitter intent fallback if POSSEd to twitter
  • ATOM fallback is provided for all feeds
  • Notes and articles are POSSEd to twitter automatically and manually/indirectly to facebook
  • Logging in using indieauth
  • New post UI is publicly viewable and can be used to create new posts on other people’s micropub-enabled sites

Roadmap

(Less of a “road” map, more of a kind of vague treasure map with no compass. Like pirates have)

Currently working on:

  • Adjusting Taproot to handle multiple post types better (much like idno in presentation)
  • Fixing Twitter POSSE truncation code

Next up, in approximate order of priority:

  • audio posts, maybe POSSE to soundcloud
  • likes and reposts
  • Use DOM templating for all templates
  • Extract more of the code behind my listeners and services into reusable open-source libraries

Done (latest first):

  • Implemented a basic micropub endpoint, capable of creating notes
  • Made it easier to edit note and article published datetimes and lists of syndication URLs
  • Timezones now included in all datetimes in feeds and posts
  • Showing mentions+mention counts on articles as well as notes, made templates and logic more reusable
  • Composite feed on homepage instead of just notes
  • Implemented distributed Indieauth and micropub, allowing people to use my new-note UI to create posts on their own site
  • Integrated with Bridgy instead of manually shimming backfeed
  • Added file upload control to note uploading UI
  • Improved new note location UI with interactive map preview for manual adjustments
  • Removed dedicated ATOM endpoints, replaced with generic mf2 to ATOM conversion endpoint
  • Basic phone UI on homepage
    • works nicely on desktop, needs improvement and Tropo involvement for good mobile UX
  • Mobile-oriented homepage design as per previous brainstorming
  • Allowed notes to have names, which display as a small title at the top of the note. For shorter named articles which don’t quite fit into /articles, which is styled for long-form “serious” essays
  • Improved check in/location note UI, location now shown on interactive Leaflet map before posting, can be corrected by dragging on map
  • Added last-seen location+map, last 3 articles to homepage
  • Added a homepage feed after feed reader research demonstrated need for URL -> latest posts
  • Deploying new rewritten version
  • Rewriting the whole thing to remove frameworkey nonsense code
    • using silex as app library
    • no DB — faking it with CSV indexes like the idiot I am (it’s actually surprisingly effective)
  • Accepting webmentions as well as pingbacks
  • Federated comments
  • Post-by-email
  • Major migration away from DB storage, toward flat files, with lots of codebase cleanup in the process
  • Import of old posts from Diaspora see them here
  • Frontend js using Backbone and requirejs

Structure

I got sick of all the complicated frameworkey nonsense which was accumulating in taproot. Symptoms:

  • not immediately knowing where to look to change any one particular bit of functionality
  • having to change things in different packages+deploy+update to do simple functionality changes
  • too many layers of abstraction+indirection — specifically, mixing abstraction layers
  • no longer fun to work on, simple additions take too much time

And as such I am rewriting taproot to be smaller and flatter and saner.

Documentation of the aforementioned frameworkey nonsense is retained for posterity at the bottom of the page.

Storage

Mentions

Mentions are stored separately from the content they refer to, for ease of editing and moderation. They’re stored with a flattened h-entry-like representation of them which can be easily used in various contexts.

The HTML and HTTP headers from cachable (i.e. public) URLs which are fetched (e.g. webmentioned or webmentioners) are stored in an archive, which also acts as a cache.

Incoming webmentions are processed by a mentions module, which does the following things:

  • fetches and archives (using taproot/archive) the webmention source URL
  • resolves the target URL and gets it’s path
  • parses microformats on the page, looks for the first h-entry. if found, it’s flattened, normalised and purified to avoid XSS attacks
  • from rel values/u-* values in the h-entry, determine the primary mention relationship type, which is one of reply, rsvp, like, repost or mention — mention is used as a fallback if no more specific one is given
  • gives the mention an ID based on the source URI, target URI and current datetime
  • stores the mention in a file containing the source, target, resolved target path and normalised h-entry
  • indexes that file for querying later

Then, when looking for mentions of a particular URL (e.g. http://waterpigs.co.uk/notes/1000/), the following steps are taken:

  • Query the index for rows where the resolved path matches the path of the URL we’re looking for
  • use the rows given to either determine mention type counts for the URL, or fetch the associated files and parse to get the h-entry data for displaying

Posts

Modules all store their data in flat files (usually YAML) which I index in CSV file indexes, read/written using PHP’s fgetcsv and fputcsv functions, which are surprisingly speedy.

For content which has an intrinsic name (like pieces of music, contacts or blog posts) I tend to use an ASCII-fied slug of the name for the ID, generated using Helpers::toAscii. Otherwise (for notes) I create a <= six-character ID from the newbase60 representation of epoch days + number of seconds into that day.

Schema

The data is all stored in a microformats-2 JSON variant structure which is substantially flatter than mf2 JSON — all properties are properties of the parent object, not a properties key, there is no “children” key and only plural properties map to arrays. I keep most of the semantics the same except on the occasions where I prefer a different term (tag over category) or want to aggregate the properties from what would be multiple different mf properties into one (e.g. combining geo, adr and venue into “location”).

  • Originally I intended on using the activitystreams schema to store and represent all my content, the idea being that it would allow me to make listeners which could work seamlessly on all sorts of content types. In actuality, I have nowhere near as many content types as I anticipated, and want to treat each one very differently. In addition to this, I found some of the semantics within activitystreams irritating (e.g. displayName vs title, both horrible names) and the activity ~= object ambiguity confusing. Coupled with the fact that there is no activitystreams client, I decided to drop activitystreams as the basis for representing my data. --Waterpigs.co.uk 12:33, 14 June 2013 (PDT)

Previous, abandoned structure

Taproot has a highly modular structure. It is made up of an Application class, which handles event dispatching, simple dependency injection, routing and module loading. There are two types of modules — full modules, which expose URLs, and listeners, which listen to events dispatched within taproot.

Currently there are modules for Index, Notes, Articles, Contacts, Tunes, Tags and Mentions.

There are a large number of listeners which handle everything from authentication to POSSE and content transformation. Typically if I want to add a new piece of functionality I’ll do it in a listener, if I want to add a new content type I’ll create a new module.

Notes

(out of date as of 2013-12)

The Notes module is the mainstay of my Taproot usage. It handles the creation and display of short notes — kinda like tweets, but with a potentially much larger scope through the excessive use of tagging.

As in most of Taproot, the notes module handles basic CRUDL of data, and various listeners do everything else, including converting markdown, interpreting hashtags, autolinking @names, syndication of content.

Posting a note looks like this:

caption

A still of the posting UI just after the note has saved:

caption

See Also