incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sankalp kohli <kohlisank...@gmail.com>
Subject Re: column sort order and reversed sort performance question
Date Sun, 07 Jul 2013 21:01:20 GMT
One of the reasons of using reverse order is to skip the tombstones while
doing a range query. Here is an example.
*

Lets say we want to read all the data which is between 10 minutes old upto
60 minute old. If the data is stored from old to new in an sstable, then we
have to go over all the tombstones before we get any column which is live.
All the lazy iterators on the column will start with giving columns which
are 60 minutes old, 59 minutes old and so on. They all will keep getting
tombstones and we will not find any live column till we reach 11 or 12
minute. SO this way we have to go over all the data and tombstones between
60 and 12(if non deleted columns are found at 12 minute).

Whereas, if we store the data from new to old, when we iterate over
columns, we will get newer columns first which will have been tombstones
and we will find live columns which we can return.

But if there is less columns than we want, then the way we store data does
not matter. Because we anyway have to go over all the columns from 10 to 60
minutes.

*


On Wed, Jul 3, 2013 at 10:11 AM, Robert Coli <rcoli@eventbrite.com> wrote:

> On Wed, Jul 3, 2013 at 6:02 AM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:
> >
> > We loaded 5 million columns into a single row and when accessing the
> first 30k and last 30k columns we saw no performance difference.  We tried
> just loading 2 rows from the beginning and end and saw no performance
> difference.  I am sure reverse sort is there for a reason though.  In what
> context do you actually see a performance difference with reverse sort???
>
>
> http://thelastpickle.com/2011/10/03/Reverse-Comparators/
> "
> When a query does not specify a start column (and does not specify
> reversed) the server can just start reading columns from the start without
> having to worry about finding the right place to start. This is exactly
> what we can do for the Descending CF.
>
> For the regular Ascending CF we need to specify reversed, so the server
> must read the row index and work out which column is column count from the
> end of the row.
>
> There is no comparison really.
> "
>
> =Rob
>
>

Mime
View raw message