accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Trendulo - A Twitter Analytics Demo on Accumulo
Date Fri, 27 Apr 2012 19:09:10 GMT
On Wed, Apr 25, 2012 at 3:10 PM, Jared winick <jaredwinick@gmail.com> wrote:

> I am not exactly sure how to answer the question about storage size per
> tweet as I am not actually storing the original tweet and if a counter
> already exists for an n-gram/time period, then incrementing that counter
> doesn't increase the storage size. I can follow up with the current storage
> I am using though.
>

I see I can make some estimates based on the information in your talk. The
slides are awesome, btw.

Using the information you provided: Dec 24 - March 12... that's 88 days.
 2.6e9 entries, 3 million-ish tweets per day:

2.6e9 / (3e6 * 88)

~10 entries per tweet.

Also, you report disk usage of 72G,  which I will interpret as 72 * (1024
** 3) bytes.

So, each tweet, on average occupies: 72G / (88 * 3e6) Or, ~300 bytes.

-Eric

Mime
View raw message