incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Yen <yulin...@gmail.com>
Subject Re: Possibility of going OOM using get_count
Date Fri, 23 Sep 2011 06:01:58 GMT
On Fri, Sep 23, 2011 at 12:28 PM, aaron morton <aaron@thelastpickle.com>wrote:

> Offsets have been discussed in previously. IIRC the main concerns were
> either:
>
> There is no way to reliably count to start the offset, i.e. we do not lock
> the row
>

In the new get_count function, cassandra does the internal paging in order
to get the total count. Without locking the row,  the count could still be
unreliable (someone might be deleting some columns while cassandra is
counting the columns).


>
> Or performance related in, as there is not a reliable way to skip 10,000
> columns other than counting 10,000 columns. With a start col we can search.
>
>
I am just curious, basically "skip 10,000 columns to get the start column"
can be done as what cassandra does for new get_count function (internal
paging). I just can not think of a reason why it is doable for get_count but
it can not be done for the offset.

I know the result might not be reliable and the performance might be varied
depends on the offset, but if cassandra can using internal paging to get
count, it should be able the apply the same method to get the start column
for the offset.


> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/09/2011, at 8:50 PM, Boris Yen wrote:
>
> I was wondering if it is possible to use similar way as CASSANDRA-2894<https://issues.apache.org/jira/browse/CASSANDRA-2894>
to
> have the slice_predict support the offset concept? With the offset, it would
> be much easier to implement the paging from the client side.
>
> Boris
>
> On Mon, Sep 19, 2011 at 9:45 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>
>> Unfortunately no, because you don't know what the actual
>> last-column-counted was.
>>
>> On Mon, Sep 19, 2011 at 4:25 AM, aaron morton <aaron@thelastpickle.com>
>> wrote:
>> > get_count() supports the same predicate as get_slice. So you can
>> implement
>> > the paging yourself.
>> > Cheers
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> > On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
>> >
>> >
>> > On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <benoit@noisette.ch>
>> wrote:
>> >>
>> >> The workaround for 0.7 is calling get_slice and count on client side.
>> >> It's heavier, sure, but you will then be able to set start column
>> >> accordingly.
>> >
>> > I was afraid of that :(
>> > Will follow that method. Thanks.
>> >>
>> >>
>> >> 2011/9/19 Tharindu Mathew <mccloud35@gmail.com>:
>> >> > Thanks Aaron and Jake for the replies.
>> >> > Any chance of a possible workaround to use for Cassandra 0.7?
>> >> >
>> >> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <
>> aaron@thelastpickle.com>
>> >> > wrote:
>> >> >>
>> >> >> Cool
>> >> >> Thanks, A
>> >> >> -----------------
>> >> >> Aaron Morton
>> >> >> Freelance Cassandra Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>> >> >>
>> >> >> This is fixed in 1.0
>> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
>> >> >>
>> >> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <
>> mccloud35@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi everyone,
>> >> >>> I noticed this line in the API docs,
>> >> >>>
>> >> >>> The method is not O(1). It takes all the columns from disk
to
>> >> >>> calculate
>> >> >>> the answer. The only benefit of the method is that you do not
need
>> to
>> >> >>> pull
>> >> >>> all the columns over Thrift interface to count them.
>> >> >>>
>> >> >>> Does this mean if a row has a large number of columns calling
this
>> >> >>> method
>> >> >>> might make it go OOM?
>> >> >>> Thanks in advance.
>> >> >>> --
>> >> >>> Regards,
>> >> >>>
>> >> >>> Tharindu
>> >> >>> blog: http://mackiemathew.com/
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> http://twitter.com/tjake
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards,
>> >> >
>> >> > Tharindu
>> >> > blog: http://mackiemathew.com/
>> >> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> >
>> > Tharindu
>> > blog: http://mackiemathew.com/
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>

Mime
View raw message