Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A554DD090 for ; Fri, 29 Jun 2012 18:53:00 +0000 (UTC) Received: (qmail 77488 invoked by uid 500); 29 Jun 2012 18:53:00 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 77457 invoked by uid 500); 29 Jun 2012 18:53:00 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 77447 invoked by uid 99); 29 Jun 2012 18:53:00 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Jun 2012 18:53:00 +0000 Received: from localhost (HELO mail-pb0-f41.google.com) (127.0.0.1) (smtp-auth username afuchs, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Jun 2012 18:52:59 +0000 Received: by pbbrp2 with SMTP id rp2so5438884pbb.0 for ; Fri, 29 Jun 2012 11:52:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.242.7 with SMTP id wm7mr8737092pbc.98.1340995979415; Fri, 29 Jun 2012 11:52:59 -0700 (PDT) Received: by 10.68.15.35 with HTTP; Fri, 29 Jun 2012 11:52:59 -0700 (PDT) In-Reply-To: References: Date: Fri, 29 Jun 2012 14:52:59 -0400 Message-ID: Subject: Re: querying for relevant rows From: Adam Fuchs To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=047d7b339cd1ff913304c3a0f47d --047d7b339cd1ff913304c3a0f47d Content-Type: text/plain; charset=ISO-8859-1 You can't scan backwards in Accumulo, but you probably don't need to. What you can do instead is use the last timestamp in the range as the key like this: key=2 value= {a.1 b.1 c.2 d.2} key=5 value= {m.3 n.4 o.5} key=7 value={x.6 y.6 z.7} As long as your ranges are non-overlapping, you can just stop when you get to the first key/value pair that starts after your given time range. If your ranges are overlapping then you will have to do a more complicated intersection between forward and reverse orderings to efficiently select ranges, or maybe use some type of hierarchical range intersection index akin to a binary space partitioning tree. Cheers, Adam On Fri, Jun 29, 2012 at 2:19 PM, Lam wrote: > I'm using a timestamp as a key and the value is all the relevant data > starting at that timestamp up to the timestamp represented by the key > of the next row. > > When querying, I'm given a time span, consisting of a start and stop > time. I want to return all the relevant data within the time span, so > I was to retrieve the appropriate rows (then filter the data for the > given timespan). > > Example: > In Accumulo: (the format of the value is .) > key=1 value= {a.1 b.1 c.2 d.2} > key=3 value= {m.3 n.4 o.5} > key=6 value={x.6 y.6 z.7} > > Query: timespan=[2 4] (get all data from timestamp 2 to 4 inclusively) > > Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and > o.5, and return the rest > > Problem: How do I know to retrieve key=1 and key=3 without scanning > all the keys? > > Can I create a scanner that looks for the given start key=2 and go to > the prior row (i.e. key=1)? > > -- > D. Lam > --047d7b339cd1ff913304c3a0f47d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable You can't scan backwards in Accumulo, but you probably don't need t= o. What you can do instead is use the last timestamp in the range as the ke= y like this:

=A0 =A0 key=3D2=A0 value=3D {a.1 b.1 c.2 d.2}
=A0 =A0 key=3D5=A0 value=3D {m.3 n.4 o.5}
=A0 =A0 key=3D7=A0 value=3D{x.6 y.6 z.7}

As long as your ranges are non-overlapping, you can just stop when you = get to the first key/value pair that starts after your given time range. If= your ranges are overlapping then you will have to do a more complicated in= tersection between forward and reverse orderings to efficiently select rang= es, or maybe use some type of hierarchical range intersection index akin to= a binary space partitioning tree.

Cheers,
Adam

On Fri, Jun 29, 2= 012 at 2:19 PM, Lam <dnaelam@gmail.com> wrote:
I'm using a timestamp as a key and the value is all the relevant data starting at that timestamp up to the timestamp represented by the key
of the next row.

When querying, I'm given a time span, consisting of a start and stop time. =A0I want to return all the relevant data within the time span, so I was to retrieve the appropriate rows (then filter the data for the
given timespan).

Example:
In Accumulo: =A0(the format of the value is =A0<letter>.<timestamp= >)
=A0 =A0 key=3D1 =A0value=3D {a.1 b.1 c.2 d.2}
=A0 =A0 key=3D3 =A0value=3D {m.3 n.4 o.5}
=A0 =A0 key=3D6 =A0value=3D{x.6 y.6 z.7}

Query: =A0timespan=3D[2 4] =A0(get all data from timestamp 2 to 4 inclusive= ly)

Desire result: retrieve key=3D1 and key=3D3, then filter out a.1, b.1, and<= br> o.5, and return the rest

Problem: How do I know to retrieve key=3D1 and key=3D3 without scanning
all the keys?

Can I create a scanner that looks for the given start key=3D2 and go to
the prior row (i.e. key=3D1)?

--
D. Lam

--047d7b339cd1ff913304c3a0f47d--