Much has been made about our recent move to make our RSS client applications free to users. To recap, last month we removed all license fees for our client applications (NetNewsWire, FeedDemon, Go mobile apps, our online reader, and Outlook plugin), and in exchange we eliminated telephone support and enabled a data syncing process between the apps and our online service that went beyond our subscription data to what we refer to as "attention data".
The telephone support bit of this was a no-brainer, we rarely had someone call for support; most of our users go to the online forums for help. So in effect, removing telephone support was more symbolic than anything else as the actual impact on resource allocation was pretty minimal.
The attention data topic is considerably more interesting to cover. While most commenters have adopted a wait-and-see approach, some have raised some good questions about what we are doing with that data, which in aggregate totals millions of individual line items each day. Our network datacenter now covers 2.1 million feeds that poll at least hourly, collecting well over 7 million new items of content each day.
We archive this content as well, but it's not a complete cache of the blogosphere, as would be an easy conclusion to make, because we only archive feeds that our users subscribe to and in each case we are limited to the amount of content that the feed exposes. Some feeds are full text, but far too many are still excerpts, but at any rate, it's a lot of content in both the current 24 hour set and in the archive.
First and foremost, attention data is metadata about what happens to content. At one level it's as simple as someone clicking on a headline in a feed to open a post, but also included are the actions that people take on content, such as clipping, tagging, bookmarking, and sharing of individual content items.
There are two kinds of attention data, or put a more accurate way, one set that puts the user at the center and another that puts the content item at the center. We're interested in both, but have different mechanisms for collecting each.
The free release last month focused on the attention data about content, which is why we went to some lengths to explain how we were anonymizing it. Quite honestly, it's not interesting to me that Joe Smith clicked on, bookmarked, and then sent to a friend a post in GigaOM. What is interesting is that a post in GigaOM got clicked on, bookmarked, and shared. It's not interesting to me that Joe Smith did this because I don't have any demographic data about Joe Smith, therefore the commercial value of that information is low, but this isn't to suggest that the "Joe Smith dataset" isn't interesting to Joe Smith... more on that in a minute.
Why is this attention data useful? Simply put, attention infers content authority and quality; if you share something I can make an assumption that you found it useful, which we can then use in our attention algorithm. The scoring generated by our attention algorithm can be used to make search more accurate, and it can be packaged as an API that we make available to our partners to enable their services to better filter and sort content.
We don't sell this data to marketing companies because in that context it's worthless because there is no demographic information attached to it. Recall that this attention data is focused on content and not users, and the purpose is to improve existing function and enable new features. For example, one of our media customers is using this to generate a list of stories that received a high degree of attention in the prior 24 hours and that they did not publish through their sites, in other words we are using attention data to tell them the things they did not know they didn't know.
Last year we did expose something we were involved with that speaks to the user perspective of attention, APML. This standard, which builds on the success of OPML, is attractive for some very important reasons. First and foremost, APML creates a single database about user subscriptions and attention data items, rather than attempting to merge and sync separate databases around each. Second, it's a true industry standard that is emerging through a process of cooperation rather than imposition, and lastly, it makes attention data portable.
We fundamentally believe that data about your browsing habits is yours and that means you should be able to take it with you wherever you go. APML does this much in the same way that OPML does it for subscription data, and that has been a very successful model.
In many ways the ultimate commercial value of attention data is speculative, but we are not totally flying blind here either as we do have concrete examples about how it is enhancing the value of network functions that are important to our consumer and commercial clients. Speaking as a user, the APML piece is very important to me because I can accumulate this data over time and transfer it from service to service without penalty, and as more services take advantage of APML I will receive benefits as a user.
Jeff Nolan is vice president of the software-as-a-service group at NewsGator Technologies. Based in the Bay Area, Jeff also writes frequently on these topics on his personal blog, Venture Chronicles.