Mon 18 Aug 2014 3:12PM

Tag aggregation

BK Brad Koehn Public Seen by 109


We create a Tag Aggregator (“taggregator”) service which multiplexes public posts to ATOM feeds of tags in those posts. Pods publish public posts to the taggregator and subscribe to feeds for each tag the pod’s users follow. Thus every pod can receive tags from every other pod, eliminating the current problem of small and/or new pods not getting tagged posts from pods they haven't yet discovered. Other potential advantages like a searchable index of all posts are discussed as well.

• Each pod creates an Atom feed for all public posts. It registers this feed with the feed taggregator service via a simple REST API. Atom feeds use a hub similar to the PubSubHubbub used by the current public feed aggregation. [Option 2: pods register each user’s existing public Atom feed with the taggregator.]
• For each tag a user on a pod wishes to subscribe to, the pod registers with the taggregator, which creates a public feed for that tag (unless a feed already exists for it) and returns a link to it. These feeds also use PubSubHubbub for efficient distribution. Each tag results in one and only one feed, only when a user subscribes to it (so #singleusetagssomebodymadeup won't get a feed unless someone subscribes to it).
• When the taggregator service receives a notification of a new public post, it parses the tags from the post, and adds the post to the appropriate tag feed(s). Pods listening to those feeds get notified via PubSubHubbub and fetch the post, routing it to all users listening to the tag.

This technique has several advantages:

• It’s simple. The taggregator application is small and requires a relatively small amount of CPU to parse tags out of the posts and put them into feeds. Internally the feeds could easily be represented by database tables that are purged of old posts regularly. The posts could be kept normalized with a simple table mapping each tag found in the post to its feed.
• It leverages existing pod code. Pods already know how to publish to and subscribe from PubSubHubbub Atom feeds. PubSubHubbub moves most of the data most of the time.
• It’s reproducible. Just as pods don’t have to use a particular PubSubHubbub service, they don’t need to use a particular taggregator. Source code for the taggregator should be in GitHub and anyone can create their own. Pods could publish their public posts to and subscribe from more than one taggregator service.
• Pods don't have to participate. In order to avoid dependence upon centralized services, pods can continue to use the existing mechanism for discovering tagged posts.

To be determined:

• When are tag feeds culled? How does the taggregator know that no one is listening to a feed?
• The taggregator would have a centralized, organic list of all pods on the network. It could also be used for other things like aggregated user search and even public message search if it indexed the posts with a tool like Apache Solr. That would obviously require a much greater amount of CPU and disk to generate, store, and search the index.
• What about tags in comments?


Jonne Haß Mon 18 Aug 2014 4:10PM

We already have several discussions around tag/public post federation. Did really none fit you?


Brad Koehn Mon 18 Aug 2014 4:19PM

This post has been on the wiki for a long time; someone suggested it belonged on Loomio instead. I'd be happy to migrate it to a proposal on another discussion, but given the way Loomio handles parallel discussions for solving the same topic (poorly, IMHO), I thought a new discussion would be simplest.


Jonne Haß Mon 18 Aug 2014 7:17PM

Well, I regarded the Proposals category in the wiki as a place for such writeups that you can quickly link to in one sentence in whatever discussion anywhere.