Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2EA5F11658 for ; Tue, 2 Sep 2014 08:36:21 +0000 (UTC) Received: (qmail 7369 invoked by uid 500); 2 Sep 2014 08:36:21 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 7327 invoked by uid 500); 2 Sep 2014 08:36:21 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 7310 invoked by uid 99); 2 Sep 2014 08:36:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Sep 2014 08:36:21 +0000 Date: Tue, 2 Sep 2014 08:36:20 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7860) csv2sstable - bulk load CSV data to SSTables similar to json2sstable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118032#comment-14118032 ] Sylvain Lebresne commented on CASSANDRA-7860: --------------------------------------------- We agree that cqlsh COPY is too slow and it was recently improved by CASSANDRA-7405. There may be other improvements that can be done for it and we welcome contributions in that regard. If you really prefer writing sstables directly, there is the CQLSSTableWriter which allows you to easily write your own whatever2sstable tool that fits your requirement. In fact, json2sstable itself has never been make for bulk loading in the first place (CQLSSTableWrite is) and it's somewhat deprecated now (it's not part of the binary distribution in 2.1). > csv2sstable - bulk load CSV data to SSTables similar to json2sstable > -------------------------------------------------------------------- > > Key: CASSANDRA-7860 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7860 > Project: Cassandra > Issue Type: New Feature > Environment: DataStax Community Edition 2.0.9 > Reporter: Hari Sekhon > Priority: Minor > > Need a csv2sstable utility to bulk load billions of rows of CSV data - impractical to have to pre-convert to json before bulk loading to sstable. > CQL COPY really is too slow - a test of mere 4 million row 6GB CSV directly took 28 minutes... while it only takes 60 secs to cat all that data off the hdfs source filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)