accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lam <dnae...@gmail.com>
Subject Re: querying for relevant rows
Date Fri, 29 Jun 2012 19:28:27 GMT
Thanks to all!  I like this solution. I confirmed what you said and
will use scanner.setBatchSize() as appropriate.
--
D. Lam


On Fri, Jun 29, 2012 at 2:14 PM, John Vines <john.w.vines@ugov.gov> wrote:
> If you set the end to null, it will go until the end of the table.
>
> Scanners will bring back batches, default is 1000 key-value pairs. If you
> know you're only looking for a specifc number of Keys, you can drop the
> batch size to match you needs better. But if you end up grabbing multiple
> smaller batches, your performance time will be overcome with network
> overhead costs.
>
> John
>
> On Fri, Jun 29, 2012 at 3:02 PM, Lam <dnaelam@gmail.com> wrote:
>>
>> This sounds like a good idea.  But how do I scan forward -- do I set
>> end=null in the following code?
>>
>>
>>                        Scanner scan=conn.createScanner(tableName, auths);
>>
>>                        Text start=new
>> Text(Value.longToBytes(beginTimestamp));
>>                        Text end=new Text(Value.longToBytes(endTimestamp);
>>                        scan.setRange(new Range(start, true, end, false));
>>
>>                        for(Entry<Key,Value> e:scan) ...
>>
>>
>> And is it efficient?  i.e., the scanner won't move to the next entry
>> until the next iteration through the for loop, right?
>>
>> I'll run a test right now.
>>
>> --
>> D. Lam
>>
>>
>> On Fri, Jun 29, 2012 at 1:52 PM, Adam Fuchs <afuchs@apache.org> wrote:
>> > You can't scan backwards in Accumulo, but you probably don't need to.
>> > What
>> > you can do instead is use the last timestamp in the range as the key
>> > like
>> > this:
>> >
>> >     key=2  value= {a.1 b.1 c.2 d.2}
>> >     key=5  value= {m.3 n.4 o.5}
>> >     key=7  value={x.6 y.6 z.7}
>> >
>> > As long as your ranges are non-overlapping, you can just stop when you
>> > get
>> > to the first key/value pair that starts after your given time range. If
>> > your
>> > ranges are overlapping then you will have to do a more complicated
>> > intersection between forward and reverse orderings to efficiently select
>> > ranges, or maybe use some type of hierarchical range intersection index
>> > akin
>> > to a binary space partitioning tree.
>> >
>> > Cheers,
>> > Adam
>> >
>> >
>> >
>> > On Fri, Jun 29, 2012 at 2:19 PM, Lam <dnaelam@gmail.com> wrote:
>> >>
>> >> I'm using a timestamp as a key and the value is all the relevant data
>> >> starting at that timestamp up to the timestamp represented by the key
>> >> of the next row.
>> >>
>> >> When querying, I'm given a time span, consisting of a start and stop
>> >> time.  I want to return all the relevant data within the time span, so
>> >> I was to retrieve the appropriate rows (then filter the data for the
>> >> given timespan).
>> >>
>> >> Example:
>> >> In Accumulo:  (the format of the value is  <letter>.<timestamp>)
>> >>     key=1  value= {a.1 b.1 c.2 d.2}
>> >>     key=3  value= {m.3 n.4 o.5}
>> >>     key=6  value={x.6 y.6 z.7}
>> >>
>> >> Query:  timespan=[2 4]  (get all data from timestamp 2 to 4
>> >> inclusively)
>> >>
>> >> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and
>> >> o.5, and return the rest
>> >>
>> >> Problem: How do I know to retrieve key=1 and key=3 without scanning
>> >> all the keys?
>> >>
>> >> Can I create a scanner that looks for the given start key=2 and go to
>> >> the prior row (i.e. key=1)?
>> >>
>> >> --
>> >> D. Lam
>> >
>> >
>
>

Mime
View raw message