Aug
15
(2006)
Estimating blog feed subscribers in Drupal
Filed under: Uncategorized. Tags: drupal, mysql, rss, statistics.
Guesstimating the size of an RSS feed audience is always a huge shot in the dark, but sometimes I get curious about how many people subscribe to this silly blog. If I was willing to surrender my feeds to Feedburner, I could get some pretty detailed stats. But, I don’t want to hand over that.
So, I thought about digging into the accesslog that’s stored in Drupal’s database. I’ve set my copy to store access logs for the past 2 weeks, and it dutifully records which pages are viewed, as well as the IP address the request came from. It’s just a subset of a typical webserver log, so there isn’t any privacy issue here (if you’re really worried about being tracked online, you’re already using an anonymizing proxy…)
A quick MySQL poke-and-test session, and I came up with a quick query that spits out a WAG about the number of feed subscribers. It’s not accurate, because people might be accessing the feed from multiple locations (recording different IP addresses), and services like Bloglines might be sending many readers in under the cover of a single IP address (at the moment, Bloglines claims 334 folks are subscribing to various feeds published by this blog). Also, it makes no distinction between bots (Google and friends) and actual human-proxying-agents. Whatever.
Here’s my brain-dead-simple query. It just looks for all requests for feed-related paths, and counts up the number of unique IP addresses.
select count(distinct hostname) from accesslog where path like '%/feed%'
YMMV. IANAP. YHBH. Wait. Not the last one.
According to my Drupal log, there have been 955 unique IP addresses requesting the RSS/atom feeds on this blog in the last 2 weeks. That may be higher or lower than the number of actual humans reading the blog. Still, ballpark order-of-magnitude WAG at roughly 1,000. That kind of boggles my mind.
Update: Oops. My rudimentary query forgot subscribers of “rss.xml” – which turned out to be more than the various /feed subscribers! Also, thanks to a tip from Bér Kessels, I added the cool XStatistics module, which takes care of the guesswork. According to it, there are 2062 subscribers to the various feeds on this blog. Wow.


heh. you’re doing better than me. my folks are vaguely aware that I have a website, but I think they might see it maybe once a year, if I send them a link via email…
Thank you for the heads up! I really enjoy following your commentary re: Drupal. I've learned a lot of new tricks reading your posts and just downloaded xStatistics based on the link you provided. If correct, I have 3000% more RSS subscribers than I thought (61 as shown in the summary vs my mom + me)
Sincerly,
Your RSS subscriber #963
Apparently you didn't even consider the feed on /rss.xml , while on my own blog this is the most requested URL.
Is it significant on yours too ?
What Ber said.
 Although my test for a "feed reader" is if they have visited the site multiple times in the period such as more than 5 times over a two week period since it seems safe to say that most subscribers would check at least every other work day.
And of course the user agent could be very helpful in this, for example if you could ignore the bots and search engines.Â
Some other ideas:
 select count(1), hostname from accesslog where path like '%feed%' or path like '%rss%' group by hostname;
Gives me 500 records on one particular site:
select count(1), hostname from accesslog where path like '%feed%' or path like '%rss%' group by hostname having count(1) > 5;
Looking for cases where they have pulled down the page more than 5 times gives me 73 subscribers. Looking for more than 10 brings it down to 51.
Looks good. Though I am not a SQL guru at all, your “having count” looks interesting.
But what does the (1) mean? Do you refer to the first column that way?
If we can add a time-frame to that query it should be much more representative then what we have now. With a timeframe I am thinking about: Host X visited a path more then 3 times in the last three days.
That would rule out people who have a large lag set (i.e. check that feed only once a week) but it would exclude most of the bots.
You can include any value in the select and it will come out for every row in the set, so the (1) is just a very short item to associate with each row. In some sql engines count(1) is (or was) faster than count(*) or count(id) so I got in the habit of doing it.
The "having" operator is very powerful – basically allowing you to do a "where" on a column that is in an aggregate function. Fun!  Â
To get the "more than 35 requests in 3 days" it's just a matter of  select count(1), hostname from accesslog where path like '%feed%' or path like '%rss%' and timestamp >= %d group by hostname having count(1) > 3", (time()-259200).
Hallo Ber,
Kom jou ook overal tegen.
Gaat wel goed geloof ik met de Drupal Site.
Zal eens verder bouwen en kijken wat er uitkomt.
Groet,
Mario
Also many folks will subscribe through an aggrating feed (i.e. Planet Drupal).
How many take that route may be hard to gauge without going into drupal.org logs.
Xstatsistics has theis feature built in. The code needs more smartness though, but it is a few steps ahead of your code above
I’d love to see some improvements to the SQL and/or the code, most notably, we should investigate if certain IP hit the urls */feed/* with patterns. For example every hour, or every 15 minutes, every 56minutes etc.
Doug – the bikeride must be contagious. another TLC staffer started riding yesterday
fgm – oops. forgot about that! There’s another 1213 subscribers. Sneaky subscribers, hiding behind rss.xml…
Bér – xstatistics looks really handy (and lightweight – I like that. not overdone with graphs and the like. nice).
According to XStatistics, I have 2062 feed subscribers. Holy crap.
No boggle about it.
This is a fantastic space; all the more for those even remotely interested in Drupal, and how to make it sing.
Heck, you've even been known to goad the odd character into biking to work on occasion.
The best marketing is pull from the market. Feed is a good tool..