accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Fluo - Use cases
Date Tue, 26 Jan 2016 16:34:40 GMT
Tom

A summing combiner allows you to do one large join operation.  Fluo allows
you to incrementally do a series of large join operations where one join
feeds data to other join operations.   This is analagous to a series of map
reduce jobs where one jobs feeds input to the next.

Webindex[1] is an  example that illustrates this.   This example counts
incoming links to a page and then indexes those counts multiple ways.  The
overview section of this blog post[2] provides a basic summary of some the
functionality of the example.

Looking at the example in that overview, notice how it mentions computing
the incoming link count for http://B.org as 2.   Computing (row=http://B.org
val=2) is something you could do with a SummingCombiner.  However doing the
next step mentioned in the blog post would be much more difficult.
Whenever the count for a link changes, it updates it in three different
indexes.  So when (row=http://B.org val=2) changes to (row=http://B.org
val=3), this triggers multiple follow in operations.   One of these follow
on operations udpates an indexes that tracks which link has the most
inbound links.  For this index the following changes are made.

 * Delete row=t:(9999999-2):http://B.org
 * Insert row=t:(9999999-3):http://B.org

Using Fluo, the changes to the indexes are made a fault tolerant manner.
The new index entry will not be inserted without deleting the previous one.

Keith


[1]: https://github.com/fluo-io/webindex
[2]: http://fluo.io/webindex-long-run/

On Sat, Jan 23, 2016 at 7:38 AM, Tom D <tomdata8@gmail.com> wrote:

> Hi,
>
> Been reading a bit about Fluo.
>
> I'm not 100% sure I grasp the use-cases where you'd want to use Fluo.  For
> example, counting links to a URL:
>
> http://accumulosummit.com/program/talks/using-fluo-to-incrementally-process-data-in-accumulo/
>
> What are the reasons you'd want to use Fluo for this purpose rather than a
> SummingCombiner iterator?
>
> Many thanks.
>

Mime
View raw message