cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7860) csv2sstable - bulk load CSV data to SSTables similar to json2sstable
Date Wed, 10 Sep 2014 21:58:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129201#comment-14129201
] 

Tyler Hobbs commented on CASSANDRA-7860:
----------------------------------------

bq. Do we know when 2.1.1 will be released to try that COPY speed improvement in CQL?

On average, we usually have a minor release about once a month.  So I would guess about a
month from now.  However, you can always apply the patch from CASSANDRA-7405 to 2.1.0.  It
only affects cqlsh.

You should expect to see roughly 5 to 6k rows processed per second.

> csv2sstable - bulk load CSV data to SSTables similar to json2sstable
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-7860
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7860
>             Project: Cassandra
>          Issue Type: New Feature
>         Environment: DataStax Community Edition 2.0.9
>            Reporter: Hari Sekhon
>            Priority: Minor
>
> Need a csv2sstable utility to bulk load billions of rows of CSV data - impractical to
have to pre-convert to json before bulk loading to sstable.
> CQL COPY really is too slow - a test of mere 4 million row 6GB CSV directly took 28 minutes...
while it only takes 60 secs to cat all that data off the hdfs source filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message