accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Tillotson <>
Subject Iterators - updating other rows
Date Mon, 15 Jul 2013 10:38:35 GMT
I've got two tables of dependent data, which I was hoping to update efficiently during compaction.
This leads to the following requirements:
  - Changes to other rows
  - Changes in other tables

I've fought with iterators and embedding writers, but have had to fall back to map reduce
jobs to complete the update. 

Is there a recommended approach to this?

I bit more detail about the algorithm. 

I've two tables with different sort orders, and I use ngram row ids to group element and split
over multiple tablets, so:

nm: key1: 000: newValueId2
nm: key2: type: valueId1
nm: key3: type: valueId1

ab: valueId1: 001: blob
ab: valueId1:key2: nm
Multiple keys point to the same value in the other table but both keys and values are liable
to changes ... what I was trying to do was use special columns (column Qaulifier 000 above),
I call them care-of to do redirects as data changes real-time, with iterators this would becomes
eventually consistent and be very efficiently but a MapReduce approach requires multiple table
scans of each large table. I like the approach because the ngram splits / groups data and
the two different sorts give me different nice query characteristics.

For some reason the embedded writers were blocking - I may retry with a larger cluster. I
fought with it for a few days then resorted to MapReduce jobs until I get a chance to look
at the Accumulo code more closely. 

Would it be easy to add a special iterator that accepts (Text, Mutation) pairs much as the
AccumuloOutputFormat does ?  

Many thanks in advance

View raw message