accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seidl, Ed" <sei...@llnl.gov>
Subject Re: mutations in a combiner?
Date Fri, 09 Mar 2012 20:48:41 GMT
Thanks, I was afraid of that.

Sorry to be dense, but when you mention adding the A:needsReprocessing key…is that done
within a combiner, or by  a separate task?

Thanks,
Ed

From: Eric Newton <eric.newton@gmail.com<mailto:eric.newton@gmail.com>>
Reply-To: "accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>"
<accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>>
Date: Fri, 9 Mar 2012 12:01:46 -0800
To: "accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>"
<accumulo-user@incubator.apache.org<mailto:accumulo-user@incubator.apache.org>>
Subject: Re: mutations in a combiner?

In a word?  No.

In the future, this sort of thing will be handled by coprocessors.

I've seen people do this by marking the fields, and then use a periodic map/reduce job to
reprocess:

rowA A:needsReprocessing = "CF:CQ:VIS"
rowA CF:CQ:VIS:T3 = "start end"


-Eric

On Fri, Mar 9, 2012 at 1:58 PM, Seidl, Ed <seidl2@llnl.gov<mailto:seidl2@llnl.gov>>
wrote:
I have a wacky question…is there any way to add data to a table from within a Combiner running
at compaction time?  Here's what I'm trying to achieve…

Let's say I have a table that stores some type of data that needs to be processed in some
way (binary, xml, it doesn't matter).  I may or may not receive all the data in one shot,
so as I populate the table, I do the processing (at least to the extent possible), and insert
a row with timestamp T1.  Some time later, I get another chunk of data for a given row and
insert it.  So now the row looks like

rowA CF:CQ:VIS:T1 = "start "
rowA CF:CQ:VIS:T2 ="end"

I can set up a combiner that will emit the value "start end", but now I want to re-process
that row.  The easiest way I can think of to do this is to have the combiner create an entry
in a second table with the row id I just merged, then a separate process can consume rows
from the indicator table and do the necessary processing.  Is this at all possible?  Or should
I just move all the combining logic to an external process?

Thanks,
Ed Seidl


Mime
View raw message