accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: mutations in a combiner?
Date Sat, 10 Mar 2012 00:55:10 GMT
I have not memorized the Combiner interface, but you can do it with an
Iterator, which is just a bit more general than a Combiner.

-Eric

On Fri, Mar 9, 2012 at 3:48 PM, Seidl, Ed <seidl2@llnl.gov> wrote:

> Thanks, I was afraid of that.
>
> Sorry to be dense, but when you mention adding the A:needsReprocessing
> key…is that done within a combiner, or by  a separate task?
>
> Thanks,
> Ed
>
> From: Eric Newton <eric.newton@gmail.com>
> Reply-To: "accumulo-user@incubator.apache.org" <
> accumulo-user@incubator.apache.org>
> Date: Fri, 9 Mar 2012 12:01:46 -0800
> To: "accumulo-user@incubator.apache.org" <
> accumulo-user@incubator.apache.org>
> Subject: Re: mutations in a combiner?
>
> In a word?  No.
>
> In the future, this sort of thing will be handled by coprocessors.
>
> I've seen people do this by marking the fields, and then use a periodic
> map/reduce job to reprocess:
>
> rowA A:needsReprocessing = "CF:CQ:VIS"
> rowA CF:CQ:VIS:T3 = "start end"
>
>
> -Eric
>
> On Fri, Mar 9, 2012 at 1:58 PM, Seidl, Ed <seidl2@llnl.gov> wrote:
>
>> I have a wacky question…is there any way to add data to a table from
>> within a Combiner running at compaction time?  Here's what I'm trying to
>> achieve…
>>
>> Let's say I have a table that stores some type of data that needs to be
>> processed in some way (binary, xml, it doesn't matter).  I may or may not
>> receive all the data in one shot, so as I populate the table, I do the
>> processing (at least to the extent possible), and insert a row with
>> timestamp T1.  Some time later, I get another chunk of data for a given row
>> and insert it.  So now the row looks like
>>
>> rowA CF:CQ:VIS:T1 = "start "
>> rowA CF:CQ:VIS:T2 ="end"
>>
>> I can set up a combiner that will emit the value "start end", but now I
>> want to re-process that row.  The easiest way I can think of to do this is
>> to have the combiner create an entry in a second table with the row id I
>> just merged, then a separate process can consume rows from the indicator
>> table and do the necessary processing.  Is this at all possible?  Or should
>> I just move all the combining logic to an external process?
>>
>> Thanks,
>> Ed Seidl
>>
>
>

Mime
View raw message