Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7DCA9CEF7 for ; Thu, 12 Jul 2012 19:08:20 +0000 (UTC) Received: (qmail 86077 invoked by uid 500); 12 Jul 2012 19:08:20 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 86023 invoked by uid 500); 12 Jul 2012 19:08:20 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 86015 invoked by uid 99); 12 Jul 2012 19:08:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jul 2012 19:08:20 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [64.18.2.8] (HELO exprod7og118.obsmtp.com) (64.18.2.8) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 12 Jul 2012 19:08:12 +0000 Received: from mail-bk0-f52.google.com ([209.85.214.52]) (using TLSv1) by exprod7ob118.postini.com ([64.18.6.12]) with SMTP ID DSNKT/8ghtfms2rgxql5qMWw25Uu6b9/uS4v@postini.com; Thu, 12 Jul 2012 12:07:51 PDT Received: by bkcjf3 with SMTP id jf3so2393003bkc.11 for ; Thu, 12 Jul 2012 12:07:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=zoGoiwqEUuN3nyQJ6NTihupePwYIXjshFHVxmYnCbrg=; b=cbCxR4caYMRM7YUMPv7oAPpFiFOMSwHNfxx5VY0ojONBNlqVnxa/oMbOCOxp2NpVXh SQHbco2i9zpqLrkp6QIGg8QSQCONfpZ3qam5c2fAv686CqZ2qkTxPSL65atgWXRFFdoS l8HybDa7qWC9HPnmfZEA2/F6xil5Bgm/olc/BICKT8mUoA2x/IwzjQcXg+G4dmrdWt5G VySRwHoqxFZYSmDIGkhVjy5jdYfoGOgBshxUVvPaip9ZHTq9PetwO3FehDii2GBEwmEH idh3Ac9e6JrUOx/EMDhHOef8CUTybKcpZOeDKvD5XiLMlgkDG+9rzvNapCa40CZ0P4ah Ltxw== MIME-Version: 1.0 Received: by 10.152.104.44 with SMTP id gb12mr37444883lab.29.1342120068640; Thu, 12 Jul 2012 12:07:48 -0700 (PDT) Received: by 10.112.10.70 with HTTP; Thu, 12 Jul 2012 12:07:48 -0700 (PDT) In-Reply-To: <910092521.36342.1342108066081.JavaMail.root@linzimmb04o.imo.intelink.gov> References: <910092521.36342.1342108066081.JavaMail.root@linzimmb04o.imo.intelink.gov> Date: Thu, 12 Jul 2012 15:07:48 -0400 Message-ID: Subject: Re: Using Iterator To Toss Unchanged Values From: Corey Nolet To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=f46d04088ef5efef3704c4a6ad99 X-Gm-Message-State: ALoCoQnpjMhZ7Nby81peUMOoMtBUYB8d6RGGo6XXRyVjHtwoDtG9iTZC8fzblvf20k/icdHxk0cg --f46d04088ef5efef3704c4a6ad99 Content-Type: text/plain; charset=ISO-8859-1 NICE! On Thu, Jul 12, 2012 at 11:47 AM, Billie J Rinaldi < billie.j.rinaldi@ugov.gov> wrote: > On Thursday, July 12, 2012 8:47:41 AM, "David Medinets" < > david.medinets@gmail.com> wrote: > > I'd like to track field level changes for a given record (say, > > author). So I create a table without a VersioningIterator. And I > > insert a few records: > > > > insert "JOHN" "ATTRIBUTE" "AGE" "34" > > insert "JOHN" "ATTRIBUTE" "HEIGHT" "67" > > insert "JOHN" "BOOKS" "TITLE" "THE RISE OF ACCUMULO" > > > > The next action is that some ingest process happens and does this: > > > > insert "JOHN" "ATTRIBUTE" "AGE" "34" > > > > Since there is no VersioningIterator, there are two AGES both with > > "34" as the value. > > > > I would like an DropUnchangedValueIterator which removes the last > > inserted record. Removing the last record lets me use the n-1 > > timestamp as a LastUpdated value for the key-value pair. But as soon > > as a record is deleted, the previous records are not available > > anymore? What if the timestamp is set to MAX-timestamp so the records > > are sorted backwards? Does that avoid the blocking tombstones? I'd > > look at the source code before asking but I don't have that luxury for > > the next week or two and the question is rattling around my head. > > This is mixing the idea of a deletion entry, which removes all earlier > entries, and the the idea that iterators can arbitrarily filter out > entries. I don't think reversing the timestamp will help you much in this > case; what you want is an iterator that does pairwise comparisons of > entries, and if the values are the same keep one entry with the earlier > timestamp (then keep comparing entries for that record), and if the values > are different keep one entry with the later timestamp (then skip to the > next record). I think you'll have to write a custom iterator for that. > > Billie > > > > Naturally, I could query the database before the ingest insert. But, > > referring to slide 19 in Adam's presentation at > > http://people.apache.org/~afuchs/slides/accumulo_table_design.pdf, the > > read-modify-write design is not optimal. > -- Corey Nolet Senior Software Engineer TexelTek, inc. [Office] 301.880.7123 [Cell] 410-903-2110 --f46d04088ef5efef3704c4a6ad99 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable NICE!

On Thu, Jul 12, 2012 at 11:47 AM, B= illie J Rinaldi <billie.j.rinaldi@ugov.gov> wrote:
On Thursday, July 12, 2012 8:47:41 AM, "David Medine= ts" <david.medinets@gma= il.com> wrote:
> I'd like to track field level changes for a given record (say,
> author). So I create a table without a VersioningIterator. And I
> insert a few records:
>
> insert "JOHN" "ATTRIBUTE" "AGE" "34= "
> insert "JOHN" "ATTRIBUTE" "HEIGHT" "= ;67"
> insert "JOHN" "BOOKS" "TITLE" "THE = RISE OF ACCUMULO"
>
> The next action is that some ingest process happens and does this:
>
> insert "JOHN" "ATTRIBUTE" "AGE" "34= "
>
> Since there is no VersioningIterator, there are two AGES both with
> "34" as the value.
>
> I would like an DropUnchangedValueIterator which removes the last
> inserted record. Removing the last record lets me use the n-1
> timestamp as a LastUpdated value for the key-value pair. But as soon > as a record is deleted, the previous records are not available
> anymore? What if the timestamp is set to MAX-timestamp so the records<= br> > are sorted backwards? Does that avoid the blocking tombstones? I'd=
> look at the source code before asking but I don't have that luxury= for
> the next week or two and the question is rattling around my head.

This is mixing the idea of a deletion entry, which removes all earlie= r entries, and the the idea that iterators can arbitrarily filter out entri= es. =A0I don't think reversing the timestamp will help you much in this= case; what you want is an iterator that does pairwise comparisons of entri= es, and if the values are the same keep one entry with the earlier timestam= p (then keep comparing entries for that record), and if the values are diff= erent keep one entry with the later timestamp (then skip to the next record= ). =A0I think you'll have to write a custom iterator for that.

Billie


> Naturally, I could query the database before the ingest insert. But, > referring to slide 19 in Adam's presentation at
> http://people.apache.org/~afuchs/slides/accumulo_= table_design.pdf, the
> read-modify-write design is not optimal.



--
= Corey Nolet
Senior Software Engineer
TexelTek, inc.
[Office] 301.= 880.7123
[Cell] 410-903-2110

--f46d04088ef5efef3704c4a6ad99--