cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@yakaz.com>
Subject Re: Range search on keys not working?
Date Wed, 09 Jun 2010 16:30:40 GMT
> I don't get what you're saying. If you want to loop over your entire range
> of keys, you can do it with a range query, and start and finish will both be
> "". Is there any scenario where you would want to do a range query where
> start and/or finish do not equal "", if you use random partitioning?

I you have 1 million rows and each of these rows are ~1kB (and you request
the rows fully), I guarantee you that your range query with start="" and
finish="" will not work.

More generally, in any non toy cluster, a range query with start=""
and end="" and a
count large enough that it would retrieve all the keys will fail
(timeout that is). To loop
over your entire range of keys in any such non toy cluster, you will
start with a range
query with start="" and finish="" but with a reasonable value for
count. Then you will
do the next range query with a start equal to the last key retrieved
by the previous
range query and so on ... until you have seen all the keys.

--
Sylvain

>
> 2010/6/9 Philip Stanhope <pstanhope@wimba.com>
>>
>> I feel that there is a significant bit of confusion here.
>> You CAN use start/finish when using get_range_slices with random
>> partitioner. But you can't make any assumptions about what key will be next
>> in the range which is the whole point of "random". If you do know a specific
>> key that you care about, you can use that as a start, but again, you don't
>> know what will come next.
>> If you have a CF with 1M keys ... you can effectively do a full row scan
>> ... it is expensive and you'd have to ask yourself why you'd be wanting to
>> do this in the first place.
>> Ordering with columns for a particular key is completely dependent on the
>> CompareWith choice you make when you defined the column family. For example,
>> you can make assumptions about the sequencing of columns returned from
>> get_slice (NOT get_range_slices).
>> -phil
>> On Jun 9, 2010, at 7:29 AM, David Boxenhorn wrote:
>>
>> To use start and finish parameters at all, you need to use OPP. Start and
>> finish parameters don't work if you don't use OPP, i.e. the result set won't
>> be:  start =< resultSet < finish
>>
>> 2010/6/9 Ben Browning <ben324@gmail.com>
>>>
>>> OPP stands for Order-Preserving Partitioner. For more information on
>>> partitioners, look here:
>>>
>>> http://wiki.apache.org/cassandra/StorageConfiguration#Partitioner
>>>
>>> To do key range slices that use both start and finish parameters and
>>> retrieve keys in-order, you need to use an ordered partitioner -
>>> either the built-in OPP or your own custom one.
>>>
>>> Ben
>>>
>>> On Tue, Jun 8, 2010 at 10:26 PM, sina <ywf2008@sina.com> wrote:
>>> > what's the mean of opp? And How can i make the "start" and "finish"
>>> > useful
>>> > and make sense?
>>> >
>>> >
>>> > 2010-06-09
>>> > ________________________________
>>> > 9527
>>> > ________________________________
>>> > 发件人: Ben Browning
>>> > 发送时间: 2010-06-02  21:08:57
>>> > 收件人: user
>>> > 抄送:
>>> > 主题: Re: Range search on keys not working?
>>> > They exist because when using OPP they are useful and make sense.
>>> > On Wed, Jun 2, 2010 at 8:59 AM, David Boxenhorn <david@lookin2.com>
>>> > wrote:
>>> >> So why do the "start" and "finish" range parameters exist?
>>> >>
>>> >> On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning <ben324@gmail.com>
wrote:
>>> >>>
>>> >>> Martin,
>>> >>>
>>> >>> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
>>> >>> <Martin.Grabmueller@eleven.de> wrote:
>>> >>> > I think you can specify an end key, but it should be a key
which
>>> >>> > does
>>> >>> > exist
>>> >>> > in your column family.
>>> >>>
>>> >>>
>>> >>> Logically, it doesn't make sense to ever specify an end key with
>>> >>> random partitioner. If you specified a start key of "aaa" and and
end
>>> >>> key of "aac" you might get back as results "aaa", "zfc", "hik",
etc.
>>> >>> And, even if you have a key of "aab" it might not show up. Key ranges
>>> >>> only make sense with order-preserving partitioner. The only time
to
>>> >>> ever use a key range with random partitioner is when you want to
>>> >>> iterate over all keys in the CF.
>>> >>>
>>> >>> Ben
>>> >>>
>>> >>>
>>> >>> > But maybe I'm off the track here and someone else here knows
more
>>> >>> > about
>>> >>> > this
>>> >>> > key range stuff.
>>> >>> >
>>> >>> > Martin
>>> >>> >
>>> >>> > ________________________________
>>> >>> > From: David Boxenhorn [mailto:david@lookin2.com]
>>> >>> > Sent: Wednesday, June 02, 2010 2:30 PM
>>> >>> > To: user@cassandra.apache.org
>>> >>> > Subject: Re: Range search on keys not working?
>>> >>> >
>>> >>> > In other words, I should check the values as I iterate, and
stop
>>> >>> > iterating
>>> >>> > when I get out of range?
>>> >>> >
>>> >>> > I'll try that!
>>> >>> >
>>> >>> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
>>> >>> > <Martin.Grabmueller@eleven.de> wrote:
>>> >>> >>
>>> >>> >> When not using OOP, you should not use something like 'CATEGORY/'
>>> >>> >> as
>>> >>> >> the
>>> >>> >> end key.
>>> >>> >> Use the empty string as the end key and limit the number
of
>>> >>> >> returned
>>> >>> >> keys,
>>> >>> >> as you did with
>>> >>> >> the 'max' value.
>>> >>> >>
>>> >>> >> If I understand correctly, the end key is used to generate
an end
>>> >>> >> token
>>> >>> >> by
>>> >>> >> hashing it, and
>>> >>> >> there is not the same correspondence between 'CATEGORY'
and
>>> >>> >> 'CATEGORY/'
>>> >>> >> as
>>> >>> >> for
>>> >>> >> hash('CATEGORY') and hash('CATEGORY/').
>>> >>> >>
>>> >>> >> At least, this was the explanation I gave myself when I
had the
>>> >>> >> same
>>> >>> >> problem.
>>> >>> >>
>>> >>> >> The solution is to iterate through the keys by always using
the
>>> >>> >> last
>>> >>> >> key
>>> >>> >> returned as the
>>> >>> >> start key for the next call to get_range_slices, and the
to drop
>>> >>> >> the
>>> >>> >> first
>>> >>> >> element from
>>> >>> >> the result.
>>> >>> >>
>>> >>> >> HTH,
>>> >>> >>   Martin
>>> >>> >>
>>> >>> >> ________________________________
>>> >>> >> From: David Boxenhorn [mailto:david@lookin2.com]
>>> >>> >> Sent: Wednesday, June 02, 2010 2:01 PM
>>> >>> >> To: user@cassandra.apache.org
>>> >>> >> Subject: Re: Range search on keys not working?
>>> >>> >>
>>> >>> >> The previous thread where we discussed this is called,
"key is
>>> >>> >> sorted?"
>>> >>> >>
>>> >>> >>
>>> >>> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn
>>> >>> >> <david@lookin2.com>
>>> >>> >> wrote:
>>> >>> >>>
>>> >>> >>> I'm not using OPP. But I was assured on earlier threads
(I asked
>>> >>> >>> several
>>> >>> >>> times to be sure) that it would work as stated below:
the results
>>> >>> >>> would not
>>> >>> >>> be ordered, but they would be correct.
>>> >>> >>>
>>> >>> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt <tcurdt@vafer.org>
>>> >>> >>> wrote:
>>> >>> >>>>
>>> >>> >>>> Sounds like you are not using an order preserving
partitioner?
>>> >>> >>>>
>>> >>> >>>> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn
>>> >>> >>>> <david@lookin2.com>
>>> >>> >>>> wrote:
>>> >>> >>>> > Range search on keys is not working for me.
I was assured in
>>> >>> >>>> > earlier
>>> >>> >>>> > threads
>>> >>> >>>> > that range search would work, but the results
would not be
>>> >>> >>>> > ordered.
>>> >>> >>>> >
>>> >>> >>>> > I'm trying to get all the rows that start
with "CATEGORY."
>>> >>> >>>> >
>>> >>> >>>> > I'm doing:
>>> >>> >>>> >
>>> >>> >>>> > String start = "CATEGORY.";
>>> >>> >>>> > .
>>> >>> >>>> > .
>>> >>> >>>> > .
>>> >>> >>>> > keyspace.getSuperRangeSlice(columnParent,
slicePredicate,
>>> >>> >>>> > start,
>>> >>> >>>> > "CATEGORY/", max)
>>> >>> >>>> > .
>>> >>> >>>> > .
>>> >>> >>>> > .
>>> >>> >>>> >
>>> >>> >>>> > in a loop, setting start to the last key each
time - but I'm
>>> >>> >>>> > getting
>>> >>> >>>> > rows
>>> >>> >>>> > that don't start with "CATEGORY."!!
>>> >>> >>>> >
>>> >>> >>>> > How do I get all rows that start with "CATEGORY."?
>>> >>> >>>
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>
>>> >>
>>> > __________ Information from ESET NOD32 Antivirus, version of virus
>>> > signature database 5164 (20100601) __________
>>> > The message was checked by ESET NOD32 Antivirus.
>>> > http://www.eset.com
>>
>>
>
>

Mime
View raw message