cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pierre N. (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-9323) Bulk loading is slow
Date Thu, 07 May 2015 14:44:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pierre N. updated CASSANDRA-9323:
---------------------------------
    Description: 
When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I tested on a fresh
cassandra node (nothing in keyspace, nor tables) with good hardware (8x2.8ghz, 32G ram), but
with classic hard disk (performance won't be improved with SSD in this case I think). 

When I upload from a different server an sstable using sstableloader I get an average of 3
MB/sec, in the attached example I managed to get 5 MB/sec, which is still slow.

During the streaming process  I noticed that one core of the server is full CPU, so I think
the operation is CPU bound server side. I quickly attached a sample profiler to the cassandra
instance and got the following output : 

https://i.imgur.com/IfLc2Ip.png

So, I think, but I may be wrong because it's inaccurate sampling, during streaming the table
is unserialized and reserialized to another sstable, and that's this unserailize/serialize
process which is taking a big amount of CPU, slowing down the insert speed.

Can someone confirm the bulk load is slow ? I tested also on my computer and barely reach
1MB/sec 

I don't understand the point of totally unserializing the table I just did build using the
CQLSStableWriter (because it's already a long process to build and sort the table), couldn't
it just copy the table from offset X to offset Y (using index information by example) without
unserializing/reserializing it ?


  was:
When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I tested on a fresh
cassandra node (nothing in keyspace, nor tables) with good hardware (8x2.8ghz, 32G ram), but
with classic hard disk (performance won't be improved with SSD in this case I think). 

When I upload from a different server an sstable I get an average of 3 MB/sec, in the attached
example I managed to get 5 MB/sec, which is still slow.

During the streaming process  I noticed that one core of the server is full CPU, so I think
the operation is CPU bound server side. I quickly attached a sample profiler to the cassandra
instance and got the following output : 

https://i.imgur.com/IfLc2Ip.png

So, I think, but I may be wrong because it's inaccurate sampling, during streaming the table
is unserialized and reserialized to another sstable, and that's this unserailize/serialize
process which is taking a big amount of CPU, slowing down the insert speed.

Can someone confirm the bulk load is slow ? I tested also on my computer and barely reach
1MB/sec 

I don't understand the point of totally unserializing the table I just did build using the
CQLSStableWriter (because it's already a long process to build and sort the table), couldn't
it just copy the table from offset X to offset Y (using index information by example) without
unserializing/reserializing it ?



> Bulk loading is slow
> --------------------
>
>                 Key: CASSANDRA-9323
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9323
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Pierre N.
>         Attachments: App.java
>
>
> When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I tested on
a fresh cassandra node (nothing in keyspace, nor tables) with good hardware (8x2.8ghz, 32G
ram), but with classic hard disk (performance won't be improved with SSD in this case I think).

> When I upload from a different server an sstable using sstableloader I get an average
of 3 MB/sec, in the attached example I managed to get 5 MB/sec, which is still slow.
> During the streaming process  I noticed that one core of the server is full CPU, so I
think the operation is CPU bound server side. I quickly attached a sample profiler to the
cassandra instance and got the following output : 
> https://i.imgur.com/IfLc2Ip.png
> So, I think, but I may be wrong because it's inaccurate sampling, during streaming the
table is unserialized and reserialized to another sstable, and that's this unserailize/serialize
process which is taking a big amount of CPU, slowing down the insert speed.
> Can someone confirm the bulk load is slow ? I tested also on my computer and barely reach
1MB/sec 
> I don't understand the point of totally unserializing the table I just did build using
the CQLSStableWriter (because it's already a long process to build and sort the table), couldn't
it just copy the table from offset X to offset Y (using index information by example) without
unserializing/reserializing it ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message