Widget this blog
Take this blog with you. Add it as a widget to your page:
Or copy the embed script code below and paste it on your website.
Embed code:
If you copy and use this Widget code for use on your website(s), you agree to the following terms and conditions found here.
Get this feed

« January 2008 | Main | March 2008 »

February 21, 2008

Activity Scoring in NewsGator Online

The Great from the Many

In January, when we made all of our client readers available for free, we said we were collecting usage data to make the experience better for all users. Today, we released a feature based on that data.

At the top of the NewsGator Online reader, you’ll see a “Sort” option. When you click it, you’ll see the “Sort By Activity” option. If you choose that, you’ll see something like the display below when you click on a feed or folder.


200802210736.jpg

Your unread posts will be sorted based on total user activity in NewsGator’s online reader, FeedDemon, NetNewsWire, Inbox, and Go!. The green bar gives a graphical view of the total activity based on a scale much like the decibel system. In a sense, you can think of this as the “noise level” for the post. Posts that completely fill up the green bar are generating a lot of “noise”.

Behind the scenes, millions of rows of activity data are run through an algorithm to produce this score and scale it to this view. Actions like clipping a post or emailing it to a friend affect the score more than just clicking the title link. We learned a lot about scoring based on our experience with our NewsGator Enterprise Server (NGES) product. (NGES actually goes a step further and calculates a projected relevance score for you based on your personal feed scores and the activity scores of your co-workers.)

So if you’re in a hurry and just want to see the stories that are getting the most attention or if you’re just curious about how stories stack up against each other in terms of user engagement, flip on “Sort By Activity” to get the great posts filtered by our many users.


--------------------------------------------------

Brian Kellner is VP Products at NewsGator and has been an advocate for relevancy and discovery services and features leading the next generation of RSS in both consumer and enterprise markets.

February 11, 2008

Attention Data: Content vs. User

Much has been made about our recent move to make our RSS client applications free to users. To recap, last month we removed all license fees for our client applications (NetNewsWire, FeedDemon, Go mobile apps, our online reader, and Outlook plugin), and in exchange we eliminated telephone support and enabled a data syncing process between the apps and our online service that went beyond our subscription data to what we refer to as "attention data".

The telephone support bit of this was a no-brainer, we rarely had someone call for support; most of our users go to the online forums for help. So in effect, removing telephone support was more symbolic than anything else as the actual impact on resource allocation was pretty minimal.

The attention data topic is considerably more interesting to cover. While most commenters have adopted a wait-and-see approach, some have raised some good questions about what we are doing with that data, which in aggregate totals millions of individual line items each day. Our network datacenter now covers 2.1 million feeds that poll at least hourly, collecting well over 7 million new items of content each day.

We archive this content as well, but it's not a complete cache of the blogosphere, as would be an easy conclusion to make, because we only archive feeds that our users subscribe to and in each case we are limited to the amount of content that the feed exposes. Some feeds are full text, but far too many are still excerpts, but at any rate, it's a lot of content in both the current 24 hour set and in the archive.

First and foremost, attention data is metadata about what happens to content. At one level it's as simple as someone clicking on a headline in a feed to open a post, but also included are the actions that people take on content, such as clipping, tagging, bookmarking, and sharing of individual content items.

There are two kinds of attention data, or put a more accurate way, one set that puts the user at the center and another that puts the content item at the center. We're interested in both, but have different mechanisms for collecting each.

The free release last month focused on the attention data about content, which is why we went to some lengths to explain how we were anonymizing it. Quite honestly, it's not interesting to me that Joe Smith clicked on, bookmarked, and then sent to a friend a post in GigaOM. What is interesting is that a post in GigaOM got clicked on, bookmarked, and shared. It's not interesting to me that Joe Smith did this because I don't have any demographic data about Joe Smith, therefore the commercial value of that information is low, but this isn't to suggest that the "Joe Smith dataset" isn't interesting to Joe Smith... more on that in a minute.

Why is this attention data useful? Simply put, attention infers content authority and quality; if you share something I can make an assumption that you found it useful, which we can then use in our attention algorithm. The scoring generated by our attention algorithm can be used to make search more accurate, and it can be packaged as an API that we make available to our partners to enable their services to better filter and sort content.

We don't sell this data to marketing companies because in that context it's worthless because there is no demographic information attached to it. Recall that this attention data is focused on content and not users, and the purpose is to improve existing function and enable new features. For example, one of our media customers is using this to generate a list of stories that received a high degree of attention in the prior 24 hours and that they did not publish through their sites, in other words we are using attention data to tell them the things they did not know they didn't know.

Last year we did expose something we were involved with that speaks to the user perspective of attention, APML. This standard, which builds on the success of OPML, is attractive for some very important reasons. First and foremost, APML creates a single database about user subscriptions and attention data items, rather than attempting to merge and sync separate databases around each. Second, it's a true industry standard that is emerging through a process of cooperation rather than imposition, and lastly, it makes attention data portable.

We fundamentally believe that data about your browsing habits is yours and that means you should be able to take it with you wherever you go. APML does this much in the same way that OPML does it for subscription data, and that has been a very successful model.

In many ways the ultimate commercial value of attention data is speculative, but we are not totally flying blind here either as we do have concrete examples about how it is enhancing the value of network functions that are important to our consumer and commercial clients. Speaking as a user, the APML piece is very important to me because I can accumulate this data over time and transfer it from service to service without penalty, and as more services take advantage of APML I will receive benefits as a user.

-------------------------------------------

Jeff Nolan is vice president of the software-as-a-service group at NewsGator Technologies. Based in the Bay Area, Jeff also writes frequently on these topics on his personal blog, Venture Chronicles.

February 04, 2008

comScore widget matrix numbers are innacurate

Techcrunch recently published a post about “The Widget Kings” which promoted the comScore widget matrix as a symbol of rank among widget manufacturers. We did a little research on the accuracy of these numbers – to make a long story short; we found the numbers entirely inaccurate and incomplete as ranking of widget vendors.

This is not a new perspective, both GigaOm and Jeff Jarvis posted about this back in June 2007 when the comScore list was first released.

It’s difficult enough to track traffic accurately on the internet, much less widgets, so we weren't surprised to see some inconsistencies; it is to be expected when reports like these are first generated. But when the numbers are deceptive and wrong, the report loses all credibility as an independent ranking of widget vendors.

Let’s compare the list in April from the report just released in November. For our analysis, we looked into the changes in the standings and tried to validate their statistics with Compete and Alexa. While we appreciate that comScore, Compete and Alexa don't all track the same way, we were hoping these sites could at least get a sense of whether these other sites might show traffic increasing or decreasing over that time period.


April 2007 comScore Widget Matrix
200704_comscore_3

November 2007 comScore Widget Matrix
200711_comscore_2

Here are the things that jump out immediately.


1) Brightcove is off the list. They went from 16.9 million unique to less that 14.9 million? Let’s try to corroborate that. Here are charts from Compete.com and Alexa.com
200711_brightcove_alexa_2

200711_brightcove_3


Again, traffic is difficult to measure, but at the very least, both Compete and Alexa point to flat growth, not an 11% loss in audience.


2) Slide.com dropped from 117.1 million uniques to 39 million. Sounds like they are in trouble? Not according to Alexa and Compete.

200711_slide

200711_slide_compete


3) Musicplaylist.us at 15 million uniques in 4/07 and 11/07…

200711_musicplaylist

How does this work? Traffic to musicplaylist looks to be in a freefall.

I could go on – none of the numbers seem to make sense. Is comScore playing a shell game for their paying clients? Or is this a true third party representation of widget traffic?

Let us know your thoughts in the comments!