How to determine authorship of a post on a page - AKA the Authorship discovery algorithm / processing model for implementations.
Note: the steps of checking for "url == uid == page's URL" and "url that's also a rel-me" were incorporated inline from the steps for parsing a representative h-card. Some improvements have been made here due to feedback from implementations in practice, and those improvements should be incorporated into an iteration of representative h-card.
Authorship can potentially be spoofed, as the current algorithm may only look at the markup within an arbitrary page to determine the author.
was not actually written by aaronpk, but posted on a debian pastebin site and linked via a webmention.
is similarly spoofed.
http://checkmention.appspot.com allows you to test receiving a spoofed webmention from Jonathan Ive.
Theoretical issues are grouped here for capturing purposes. If you find a real world example of one of these, feel free to promote it to an actual issue with its own === subhead.
Name avatar display in comments
In comments-presentation, it describes how a site that accepts indieweb reply posts via webmention can retrieve those replies and display them as full-fledged comments on a post, including name and icon/avatar of commenter.
Name avatar display in a reader
In a reader (feature of an IndieWeb site), it's nice to show the name and icon/avatar of the person whose posts you're reading from their indieweb home page h-feed.
Typically this name/icon information is found via the authorship algorithm.
In some (many?) cases, an indieweb h-feed of h-entry elements does not have explicit author information for a couple of reasons:
Fallback to page representative h-card
Proposal: we could add one more fallback to lack of author h-card, or lack of rel-author, and that is to use the page being processed as the author-page if no other author page has yet been found.
I.e. change "7. if there is an author-page URL " to "7. if there is no author-page URL, use the page itself as the author-page URL" and then continue processing the rest of the algorithm accordingly.
This would handle the examples from above:
Algorithm Design Notes
Why do we parse for the authorship details in the order that we do?
First, we prefer the p-author of the h-entry first because that is the most direct way of specifying the information, visibly, on the page. There's also established practice among indieweb sites of publishing a mini h-card with photo, name (sometimes as the alt text of the photo img), and URL to the person's indieweb site root / home page. Also, it may be possible that the post is a guest post, in which case we really want the post-specific authorship information rather than anything general to the site.
Only if the post itself lacks direct authorship information do we fall back to checking for a rel-author link, which is a fairly well established practice for linking from posts to pages representing authors.
On such sites that use rel-author, they almost always point to a page that has a much richer h-card about the author than the post page itself, including a much higher likelihood of having a good photo / avatar image as part of that h-card. Thus we next prefer to go retrieve that rel-author destination, and look for a representative h-card there (per the "url == uid == page's URL" and "url that's also a rel-me" steps noted above).
Only if the rel-author page lacks an h-card do we then fallback to looking for a likely smaller (if present) h-card on the post page itself that has a u-url of the same value as the destination of the rel-author, thus indicating that it is an h-card for the author.
php mf2 getAuthor
barnabywalters/mf-cleaner getAuthor() implements several extra steps whilst missing out the steps above which require fetching another URL — at the moment getAuthor completely lacks side effects:
Support dropped 2014-08-28