cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Poor write performance with seconrady index
Date Tue, 17 Apr 2012 09:52:53 GMT
Secondary indexes require a read and a write (potentially two) for every update. Regular mutations
are no look writes and are much faster. 

Just like in a RDBMS, it's more efficient to insert data and then create the index than to
insert data with the index present. 

An alternative is to create SSTables in the hadoop jobs and bulk load them into the cluster.


Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/04/2012, at 2:51 AM, Patrik Modesto wrote:

> Hi,
> 
> I've a 4 node test cluster running Cassandra 1.0.9, 32GB memory, 4x
> 1TB disks. I've two keyspaces, rfTest2 (RF=2) and rfTest3 (RF=3).
> There are two CF, one with source data and one with secondary index:
> 
> create column family UrlGroup
>    with column_type=Standard
>    and comparator=UTF8Type
>    and default_validation_class=UTF8Type
>    and key_validation_class=UTF8Type
>    and column_metadata=
>    [{
>        column_name: groupId,
>        validation_class: UTF8Type,
>        index_type: KEYS
>    }];
> 
> I'm running Hadoop mapreduce job, reading the source CF and creating 3
> mutations for each row-key in the UrlGroup CF.
> 
> The mapreduce runs for 30minutes. When I remove the secondary index,
> the mapreduce runs just 10minutes. There are 26,273,544 mutations
> total.
> 
> Also with the secondary index, the nodes show very high load 50+ and
> iowait 70%+. Without secondary index the load is ~5 and iowait ~10%.
> 
> What may be the problem?
> 
> Regards,
> Patrik


Mime
View raw message