IndieWebCamp is a 2-day creator camp focused on growing the independent web

authorship


authorship is a claim about who the author(s) of a post are.

Contents

Determining

How to determine authorship of a post on a page - AKA the Authorship discovery algorithm / processing model for implementations.

  1. parse the h-entry for the post on the page. if no h-entry, then there's no post to find authorship for, abort.
  2. if the h-entry has a p-author, use that and its h-card to determine the author of the post.
  3. otherwise look for a top-level h-feed with author property and use that
  4. otherwise if the page is a post permalink page (AKA the post's page).
    1. if the page has a rel-author link
      1. go get the destination page and parse it for microformats2
      2. if there's an h-card with a u-url == u-uid == that page's URL, use that h-card for the author.
      3. else if there's an h-card with u-url for which there's also a rel-me link on that page (perhaps the same hyperlink element as the u-url, though not required to be), use that h-card for the author.
      4. otherwise if there's an h-card on the post's page with a u-url == rel-author link's href, use that h-card for the author.
  5. otherwise no deterministic author can be found. Implementations are encouraged to document additional heuristics below for consideration for incorporation in to the authorship algorithm.

Note: the steps of checking for "url == uid == page's URL" and "url that's also a rel-me" were incorporated inline from the steps for parsing a representative h-card. Some improvements have been made here due to feedback from implementations in practice, and those improvements should be incorporated into an iteration of representative h-card.


Test cases

Issues

Spoofing

Authorship can potentially be spoofed, as the current algorithm may only look at the markup within an arbitrary page to determine the author.

For example, any page with markup like http://paste.debian.net/plainh/587c8bb3 would be parsed as being written by Barnaby Walters. Examples of such spoofing are present at:

was not actually written by aaronpk, but posted on a debian pastebin site and linked via a webmention.

is similarly spoofed.

http://checkmention.appspot.com allows you to test receiving a spoofed webmention from Jonathan Ive.

Other theoretical

Theoretical issues are grouped here for capturing purposes. If you find a real world example of one of these, feel free to promote it to an actual issue with its own === subhead.

  • string-only author: What to do if page h-entry has an author property which is not an h-card, but a string? Treat as empty and continue to fallback methods?
    • This probably isn’t worth discussing much as I can’t find a single example of this in the wild, but as it’s supported in the h-entry spec (“optionally embedded h-card(s)) it might be worth considering/specifying for when it does happen --Barnaby Walters 15:26, 17 January 2014 (PST)

Use Cases

Name avatar display in comments

In comments-presentation, it describes how a site that accepts indieweb reply posts via webmention can retrieve those replies and display them as full-fledged comments on a post, including name and icon/avatar of commenter.

hCard is the most common way that name, URL, photo information is published about a person on the web. Thus parsing for an author's h-card to retrieve their name and avatar makes the most sense.

Algorithm Design Notes

Why do we parse for the authorship details in the order that we do?

First, we prefer the p-author of the h-entry first because that is the most direct way of specifying the information, visibly, on the page. There's also established practice among indieweb sites of publishing a mini h-card with photo, name (sometimes as the alt text of the photo img), and URL to the person's indieweb site root / home page. Also, it may be possible that the post is a guest post, in which case we really want the post-specific authorship information rather than anything general to the site.

Only if the post itself lacks direct authorship information do we fall back to checking for a rel-author link, which is a fairly well established practice for linking from posts to pages representing authors.

On such sites that use rel-author, they almost always point to a page that has a much richer hCard about the author than the post page itself, including a much higher likelihood of having a good photo / avatar image as part of that hCard. Thus we next prefer to go retrieve that rel-author destination, and look for a representative hCard there (per the "url == uid == page's URL" and "url that's also a rel-me" steps noted above).

Only if the rel-author page lacks an hCard do we then fallback to looking for a likely smaller (if present) hCard on the post page itself that has a u-url of the same value as the destination of the rel-author, thus indicating that it is an hCard for the author.

Implementation heuristics

php mf2 getAuthor

barnabywalters/mf-cleaner getAuthor() implements several extra steps whilst missing out the steps above which require fetching another URL — at the moment getAuthor completely lacks side effects:

  • If the given h-entry has an author or reviewer (for compat. with h-review if it doesn’t become more consistent with h-entry) property:
    • if the found value is a string, search all the h-cards on the page for one with a name property equal to the found value, and return it
  • If the found value is a microformat, return it
  • Look for page-scoped rel-author, search all the h-cards on the page for one with a url property equal to the first rel-author value — if found, return it
  • If a page URL is given, or the h-entry has a url property, search all h-cards on the page for one where the domain of their url property is the same as the domain of the found url — if found return that
  • Otherwise return the first found h-card on the page, or null

Converspace

Silo Implementations

Google Search

Google's search spider supports only part of the authorship algorithm, rel=author, in an oddly silo-specific way:

  • if there's rel=author link on a page
    • if it links to a Google+ profile, use that profile for authorship information
    • if it links to a home page
      • if the home page has a rel=me link to a Google+ profile
      • and the Google+ profile has profile link back to the home page
      • then use that profile for authorship information

See Also