Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EA439D8FB for ; Tue, 28 Aug 2012 18:04:09 +0000 (UTC) Received: (qmail 64574 invoked by uid 500); 28 Aug 2012 18:04:09 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 64532 invoked by uid 500); 28 Aug 2012 18:04:09 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 64524 invoked by uid 99); 28 Aug 2012 18:04:09 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2012 18:04:09 +0000 Received: from localhost (HELO mail-qa0-f41.google.com) (127.0.0.1) (smtp-auth username billie, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2012 18:04:09 +0000 Received: by qafk30 with SMTP id k30so3541336qaf.0 for ; Tue, 28 Aug 2012 11:04:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.194.65 with SMTP id dx1mr4158056qab.79.1346177048426; Tue, 28 Aug 2012 11:04:08 -0700 (PDT) Received: by 10.49.130.227 with HTTP; Tue, 28 Aug 2012 11:04:08 -0700 (PDT) In-Reply-To: <97EB0FF1279CC5428640A3FB61B10BD602DC64FB@mx1.Comcept.L-3Com.com> References: <97EB0FF1279CC5428640A3FB61B10BD602DC640F@mx1.Comcept.L-3Com.com> <503CC895.7020307@ccri.com> <97EB0FF1279CC5428640A3FB61B10BD602DC64FB@mx1.Comcept.L-3Com.com> Date: Tue, 28 Aug 2012 11:04:08 -0700 Message-ID: Subject: Re: TimeSpan Iterator From: Billie Rinaldi To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=20cf300faecdc6b7fa04c857448a --20cf300faecdc6b7fa04c857448a Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On Tue, Aug 28, 2012 at 9:51 AM, wrote: > Billie**** > > ** ** > > Your comment =93Users should be aware that this is not an efficient > operation, though.=94 may help me decide if my current use of a secondary > time index is better then. Right now I maintain a table that has > timestamps as the rowid whose values are the rowid in a metadata table. > Therefore I do one range scan based on the timestamp. Then a second look= up > of the metadata rowid. Is this more efficient? > It probably depends on what percentage of the data you're bringing back, as compared to the amount you're scanning over (if that's not the whole table). I would hypothesize if you're bringing more than N% of the data back, you might as well just use the TimestampFilter on the main table. If you're bringing a smaller percentage back, it could be better to reduce the amount of the main table you have to scan over by maintaining a secondary time index. I'm not sure what N would be. You should also make sure that the secondary index is actually reducing the amount of the main table you're scanning over, e.g. if each rowid had a full range of timestamps, you could be pulling a list of all rowids back from the index table and not reducing the scan over the main table. Also, the TimestampFilter is not optimized. Filters evaluate each key/value pair to see if it is accepted (in this case, if it is in a timestamp range). If there are a lot of timestamps for each cell (keys that are identical except for timestamp), it would be better to use seeking instead. That would involve writing a new iterator. If there aren't many timestamps for each cell, seeking won't help and the TimestampFilter will be fine. Billie > ** ** > > *From:* Billie Rinaldi [mailto:billie@apache.org] > *Sent:* Tuesday, August 28, 2012 11:46 > > *To:* user@accumulo.apache.org; john.armstrong@ccri.com > *Subject:* Re: TimeSpan Iterator**** > > ** ** > > On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong wrote:***= * > > On 08/28/2012 09:26 AM, Bob.Thorman@l-3com.com wrote:**** > > Does anyone know of a TimeSpan Iterator that will fetch rows based on > the accumulo timestamp?**** > > ** ** > > We actually wrote our own TimestampRangeIterator and TimestampSetIterator > classes. I don't know if 1.4 has any in the core libraries. It's not ve= ry > hard though.**** > > > There's a TimestampFilter in org.apache.accumulo.core.iterators.user in > 1.4. It uses a range of timestamps. Users should be aware that this is > not an efficient operation, though. > > Billie**** > --20cf300faecdc6b7fa04c857448a Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On Tue, Aug 28, 2012 at 9:51 AM, <Bob.Thorman@l-3com.com> wrote:

Billie

=A0

Your comment =93Users should be aware = that this is not an efficient operation, though.= =94 may help me decide if my current use of a secondary time index is bette= r then.=A0 Right now I maintain a table that has timestamps as the rowid wh= ose values are the rowid in a metadata table.=A0 Therefore I do one range s= can based on the timestamp.=A0 Then a second lookup of the metadata rowid.= =A0 Is this more efficient?


It probably depends on what pe= rcentage of the data you're bringing back, as compared to the amount yo= u're scanning over (if that's not the whole table).=A0 I would hypo= thesize if you're bringing more than N% of the data back, you might as = well just use the TimestampFilter on the main table.=A0 If you're bring= ing a smaller percentage back, it could be better to reduce the amount of t= he main table you have to scan over by maintaining a secondary time index.= =A0 I'm not sure what N would be.=A0 You should also make sure that the= secondary index is actually reducing the amount of the main table you'= re scanning over, e.g. if each rowid had a full range of timestamps, you co= uld be pulling a list of all rowids back from the index table and not reduc= ing the scan over the main table.

Also, the TimestampFilter is not optimized.=A0 Filters evaluate each ke= y/value pair to see if it is accepted (in this case, if it is in a timestam= p range).=A0 If there are a lot of timestamps for each cell (keys that are = identical except for timestamp), it would be better to use seeking instead.= =A0 That would involve writing a new iterator.=A0 If there aren't many = timestamps for each cell, seeking won't help and the TimestampFilter wi= ll be fine.

Billie

=A0

=A0<= /p>

From: Billie R= inaldi [mailto:billi= e@apache.org]
Sent: Tuesday, August 28, 2012 11:46


To: <= a href=3D"mailto:user@accumulo.apache.org" target=3D"_blank">user@accumulo.= apache.org; john.armstrong@ccri.com
Subject: Re: TimeSpan Iterator

=A0

On Tue, Aug 2= 8, 2012 at 6:33 AM, John Armstrong <jrja@ccri.com> wrote:

On 08/28/2012 09:26 AM, Bob.Thorman@l-3com.com wrote:

Does anyone know of a TimeSpan Iterator that will fe= tch rows based on
the accumulo timestamp?

=A0

We actually wro= te our own TimestampRangeIterator and TimestampSetIterator classes. =A0I do= n't know if 1.4 has any in the core libraries. =A0It's not very har= d though.


There's a TimestampFilter= in org.apache.accumulo.core.iterators.user in 1.4.=A0 It uses a range of t= imestamps.=A0 Users should be aware that this is not an efficient operation= , though.

Billie


--20cf300faecdc6b7fa04c857448a--