incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <ty...@riptano.com>
Subject Re: Recommended sort mechanism and partitioner
Date Fri, 15 Oct 2010 19:18:26 GMT
a) 10 mil sounds fine.  Just watch out for compaction. Huge rows can kill
you there,
from my understanding.

b) Use RandomPartitioner unless you absolutely have to use something else.

c) If you're inserting all along one row and only moving to another row when
you
hit 10 mil, you're only going to be writing to one node at a time.  In this
sense,
you might want to consider using the TimeUUID as a row key instead.  There's
not really a problem with having tons of rows in a column family.

If you want to be able to get a slice of time with this scheme, you can
either use
an order preserving partitioner or have a second column family with an index
row (or rows) sorted by TimeUUID. (This sounds like what you're suggesting.)

- Tyler

I wrote some thoughts about this on my blog. I think it's still mostly
> correct:
>
>  * http://www.ayogo.com/techblog/2010/04/sorting-in-cassandra/
>
> On Fri, Oct 15, 2010 at 11:14 AM, Wicked J <wickedj2010@gmail.com> wrote:
> > Hi,
> > I'm using TimeUUID/Sort by column name mechanism. The column value can
> > contain text data (in future they may contain image data as well) leading
> to
> > the possibility of a row out-growing the RAM capacity. Given this
> background
> > my questions are:
> >
> > a] How many columns are recommended against one row? Based on my app.
> needs,
> > I can imagine having 10 million would be a good starting point for the
> > max_limit (based on text data). Also note that my app. will use search in
> > ranges of 100 or 200 columns when there are large number of
> records(columnar
> > data) without a caching solution in the front.
> > b] What partitioner is recommended? so that the load in the cluster nodes
> is
> > not largely uneven.
> > c] Would you recommend changing the TimeUUID/Columnar sort mechanism
> (with a
> > change in the data model) to sort using row key mechanism? If so then
> what
> > partitioner is recommended?  with load not being largely uneven.
> >
> > Thanks
> >
>

Mime
View raw message