accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <john.w.vi...@ugov.gov>
Subject Re: querying for relevant rows
Date Fri, 29 Jun 2012 19:14:13 GMT
If you set the end to null, it will go until the end of the table.

Scanners will bring back batches, default is 1000 key-value pairs. If you
know you're only looking for a specifc number of Keys, you can drop the
batch size to match you needs better. But if you end up grabbing multiple
smaller batches, your performance time will be overcome with network
overhead costs.

John

On Fri, Jun 29, 2012 at 3:02 PM, Lam <dnaelam@gmail.com> wrote:

> This sounds like a good idea.  But how do I scan forward -- do I set
> end=null in the following code?
>
>
>                        Scanner scan=conn.createScanner(tableName, auths);
>
>                        Text start=new
> Text(Value.longToBytes(beginTimestamp));
>                        Text end=new Text(Value.longToBytes(endTimestamp);
>                        scan.setRange(new Range(start, true, end, false));
>
>                        for(Entry<Key,Value> e:scan) ...
>
>
> And is it efficient?  i.e., the scanner won't move to the next entry
> until the next iteration through the for loop, right?
>
> I'll run a test right now.
>
> --
> D. Lam
>
>
> On Fri, Jun 29, 2012 at 1:52 PM, Adam Fuchs <afuchs@apache.org> wrote:
> > You can't scan backwards in Accumulo, but you probably don't need to.
> What
> > you can do instead is use the last timestamp in the range as the key like
> > this:
> >
> >     key=2  value= {a.1 b.1 c.2 d.2}
> >     key=5  value= {m.3 n.4 o.5}
> >     key=7  value={x.6 y.6 z.7}
> >
> > As long as your ranges are non-overlapping, you can just stop when you
> get
> > to the first key/value pair that starts after your given time range. If
> your
> > ranges are overlapping then you will have to do a more complicated
> > intersection between forward and reverse orderings to efficiently select
> > ranges, or maybe use some type of hierarchical range intersection index
> akin
> > to a binary space partitioning tree.
> >
> > Cheers,
> > Adam
> >
> >
> >
> > On Fri, Jun 29, 2012 at 2:19 PM, Lam <dnaelam@gmail.com> wrote:
> >>
> >> I'm using a timestamp as a key and the value is all the relevant data
> >> starting at that timestamp up to the timestamp represented by the key
> >> of the next row.
> >>
> >> When querying, I'm given a time span, consisting of a start and stop
> >> time.  I want to return all the relevant data within the time span, so
> >> I was to retrieve the appropriate rows (then filter the data for the
> >> given timespan).
> >>
> >> Example:
> >> In Accumulo:  (the format of the value is  <letter>.<timestamp>)
> >>     key=1  value= {a.1 b.1 c.2 d.2}
> >>     key=3  value= {m.3 n.4 o.5}
> >>     key=6  value={x.6 y.6 z.7}
> >>
> >> Query:  timespan=[2 4]  (get all data from timestamp 2 to 4 inclusively)
> >>
> >> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and
> >> o.5, and return the rest
> >>
> >> Problem: How do I know to retrieve key=1 and key=3 without scanning
> >> all the keys?
> >>
> >> Can I create a scanner that looks for the given start key=2 and go to
> >> the prior row (i.e. key=1)?
> >>
> >> --
> >> D. Lam
> >
> >
>

Mime
View raw message