accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Trost <jason.tr...@gmail.com>
Subject Re: Large-scale web analytics with Accumulo (and Nutch/Gora, Pig, and Storm)
Date Sat, 03 Nov 2012 12:35:19 GMT
The iterator is used at scan time only so the counts are accurate.  We
could have left out the event UUID and set the value = "1" in a normal use
of ingest (where ever record is guaranteed to only be ingested once).
 Storm guarantees that event Tuple is processed at least once.  For this
application we can't tolerate inaccurate counts.  That's why we roll it up
at scan time.  Does this make sense?

On Fri, Nov 2, 2012 at 11:45 PM, David Medinets <david.medinets@gmail.com>wrote:

> Unfortunately I had to leave the meetup during the middle of John's
> presentation to catch the ferry over to New Jersey. I wish I was able
> to stay. I am curious about slide 11 which describes ingest and a scan
> time iterator. What happens during compaction? And why not ingest
> directly into the "value = 1" format? I like the "group by fields" row
> id - the name so neatly encapsulates the concept.
>
> On Fri, Nov 2, 2012 at 9:43 PM, Jason Trost <jason.trost@gmail.com> wrote:
> > Large-scale web analytics with Accumulo (and Nutch/Gora, Pig, and Storm)
> > http://www.slideshare.net/jasontrost/accumulo-at-endgame
>

Mime
View raw message