cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7860) csv2sstable - bulk load CSV data to SSTables similar to json2sstable
Date Tue, 02 Sep 2014 08:36:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118032#comment-14118032
] 

Sylvain Lebresne commented on CASSANDRA-7860:
---------------------------------------------

We agree that cqlsh COPY is too slow and it was recently improved by CASSANDRA-7405. There
may be other improvements that can be done for it and we welcome contributions in that regard.

If you really prefer writing sstables directly, there is the CQLSSTableWriter which allows
you to easily write your own whatever2sstable tool that fits your requirement. In fact, json2sstable
itself has never been make for bulk loading in the first place (CQLSSTableWrite is) and it's
somewhat deprecated now (it's not part of the binary distribution in 2.1). 

> csv2sstable - bulk load CSV data to SSTables similar to json2sstable
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-7860
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7860
>             Project: Cassandra
>          Issue Type: New Feature
>         Environment: DataStax Community Edition 2.0.9
>            Reporter: Hari Sekhon
>            Priority: Minor
>
> Need a csv2sstable utility to bulk load billions of rows of CSV data - impractical to
have to pre-convert to json before bulk loading to sstable.
> CQL COPY really is too slow - a test of mere 4 million row 6GB CSV directly took 28 minutes...
while it only takes 60 secs to cat all that data off the hdfs source filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message