accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: Using Iterator To Toss Unchanged Values
Date Thu, 12 Jul 2012 19:09:10 GMT
Billie, I think you've got it. Now I need to write it.

On Thu, Jul 12, 2012 at 11:47 AM, Billie J Rinaldi
<billie.j.rinaldi@ugov.gov> wrote:
> On Thursday, July 12, 2012 8:47:41 AM, "David Medinets" <david.medinets@gmail.com>
wrote:
>> I'd like to track field level changes for a given record (say,
>> author). So I create a table without a VersioningIterator. And I
>> insert a few records:
>>
>> insert "JOHN" "ATTRIBUTE" "AGE" "34"
>> insert "JOHN" "ATTRIBUTE" "HEIGHT" "67"
>> insert "JOHN" "BOOKS" "TITLE" "THE RISE OF ACCUMULO"
>>
>> The next action is that some ingest process happens and does this:
>>
>> insert "JOHN" "ATTRIBUTE" "AGE" "34"
>>
>> Since there is no VersioningIterator, there are two AGES both with
>> "34" as the value.
>>
>> I would like an DropUnchangedValueIterator which removes the last
>> inserted record. Removing the last record lets me use the n-1
>> timestamp as a LastUpdated value for the key-value pair. But as soon
>> as a record is deleted, the previous records are not available
>> anymore? What if the timestamp is set to MAX-timestamp so the records
>> are sorted backwards? Does that avoid the blocking tombstones? I'd
>> look at the source code before asking but I don't have that luxury for
>> the next week or two and the question is rattling around my head.
>
> This is mixing the idea of a deletion entry, which removes all earlier entries, and the
the idea that iterators can arbitrarily filter out entries.  I don't think reversing the timestamp
will help you much in this case; what you want is an iterator that does pairwise comparisons
of entries, and if the values are the same keep one entry with the earlier timestamp (then
keep comparing entries for that record), and if the values are different keep one entry with
the later timestamp (then skip to the next record).  I think you'll have to write a custom
iterator for that.
>
> Billie
>
>
>> Naturally, I could query the database before the ingest insert. But,
>> referring to slide 19 in Adam's presentation at
>> http://people.apache.org/~afuchs/slides/accumulo_table_design.pdf, the
>> read-modify-write design is not optimal.

Mime
View raw message