incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DE VITO Dominique <dominique.dev...@thalesgroup.com>
Subject RE: cost estimate about some Cassandra patchs
Date Tue, 07 May 2013 09:57:41 GMT

> -----Message d'origine-----
> De : aaron morton [mailto:aaron@thelastpickle.com] 
> Envoyé : mardi 7 mai 2013 10:22
> À : user@cassandra.apache.org
> Objet : Re: cost estimate about some Cassandra patchs
>
> > Use case = rows with rowkey like (folder id, file id)
> > And operations read/write multiple rows with same folder id => so, it could make
sense to have a partitioner putting rows with same "folder id" on the same > replicas.
> The entire row key the thing we use to make the token used to both locate the replicas
and place the row in the node. I don't see that changing. 

Well, we can't do that, because of secondary indexes on rows.
Only the C* v2 will allow the row design you mention, with secondary index.
So, this row design you mention is a no go for us, with C* 1.1 or 1.2.

> Have you done any performance testing to see if this is a problem?

Unfortunately, we have just some pieces, today, for doing performance testing. We are beginning.
But still, I investigate to know if alternative designs are (at least) possible. Because if
no alternative design is easy to develop, then there's no need to compare performance.

The lesson I learnt here is that, if I would restart our project from the beginning, I would
start a more extensive performance testing project along with business project development.
It's a kind of must-have for a NoSQL database.

So, the only tests we have done so far with our FolderPartitioner is with a one machine-cluster.
As expected, due to the more important work of this FolderPartitioner, the CPU is a better
higher (~10%), memory and network consumptions are the same than with RP, but I have strange
results for I/O (average hard drive), for example, for a write-only test. I don't know why
the I/O consumption could be much higher with our FolderPartitioner than with the RP. So,
I am questioning my measurement methods, and my C* understanding.
Well, the use of such FolderPartitioner is quite a long way to go...

Regards.
Dominique

> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com

On 7/05/2013, at 5:27 AM, DE VITO Dominique <dominique.devito@thalesgroup.com> wrote:

> > De : aaron morton [mailto:aaron@thelastpickle.com] 
> > Envoyé : dimanche 28 avril 2013 22:54
> > À : user@cassandra.apache.org
> > Objet : Re: cost estimate about some Cassandra patchs
> > 
> > > Does anyone know enough of the inner working of Cassandra to tell me how much
work is needed to patch Cassandra to enable such communication vectorization/batch ?
> > 
>  
> > Assuming you mean "have the coordinator send multiple row read/write requests in
a single message to replicas"
> > 
> > Pretty sure this has been raised as a ticket before but I cannot find one now. 
> > 
> > It would be a significant change and I'm not sure how big the benefit is. To send
the messages the coordinator places them in a queue, there is little delay sending. Then it
waits on them async. So there may be some saving on networking but from the coordinators point
of view I think the impact is minimal. 
> > 
> > What is your use case?
>  
> Use case = rows with rowkey like (folder id, file id)
> And operations read/write multiple rows with same folder id => so, it could make sense
to have a partitioner putting rows with same "folder id" on the same replicas.
>  
> But so far, Cassandra is not able to exploit this locality as batch effect ends at the
coordinator node.
>  
> So, my question about the cost estimate for patching Cassandra.
>  
> The closest (or exactly corresponding to my need ?) JIRA entries I have found so far
are:
>  
> CASSANDRA-166: Support batch inserts for more than one key at once
> https://issues.apache.org/jira/browse/CASSANDRA-166
> => "WON'T FIX" status
>  
> CASSANDRA-5034: Refactor to introduce Mutation Container in write path
> https://issues.apache.org/jira/browse/CASSANDRA-5034
> => I am not very sure if it's related to my topic
>  
> Thanks.
>  
> Dominique
>  
>  
>  
> > 
> > Cheers
> > 
> > 
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> > 
> > @aaronmorton
> > http://www.thelastpickle.com
>  
> On 27/04/2013, at 4:04 AM, DE VITO Dominique <dominique.devito@thalesgroup.com>
wrote:
> 
> 
> Hi,
>  
> We are created a new partitioner that groups some rows with **different** row keys on
the same replicas.
>  
> But neither the batch_mutate, or the multiget_slice are able to take opportunity of this
partitioner-defined placement to vectorize/batch communications between the coordinator and
the replicas.
>  
> Does anyone know enough of the inner working of Cassandra to tell me how much work is
needed to patch Cassandra to enable such communication vectorization/batch ?
>  
> Thanks.
>  
> Regards,
> Dominique
>  
>  


Mime
View raw message