Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 63E2317776 for ; Wed, 1 Apr 2015 06:15:53 +0000 (UTC) Received: (qmail 32439 invoked by uid 500); 1 Apr 2015 06:15:53 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 32397 invoked by uid 500); 1 Apr 2015 06:15:53 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 32106 invoked by uid 99); 1 Apr 2015 06:15:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2015 06:15:53 +0000 Date: Wed, 1 Apr 2015 06:15:53 +0000 (UTC) From: "Manish (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8404) CQLSSTableLoader can not create SSTable for csv file of 10M rows. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390048#comment-14390048 ] Manish commented on CASSANDRA-8404: ----------------------------------- I will check and let you know. I had faced this issue only on 32 bit Ubuntu OS and not on 64 bit Ubuntu OS. Since our QA and production machines are on 64 bit Ubuntu OS I did not face any issue in dev testing on QA machine. > CQLSSTableLoader can not create SSTable for csv file of 10M rows. > ----------------------------------------------------------------- > > Key: CASSANDRA-8404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8404 > Project: Cassandra > Issue Type: Bug > Environment: I am using Cassandra 2.1.1 on 32 bit Ubuntu 12.04. I am running the program with -Xmx1000M > manish@manish[~]:> uname -a > Linux manish 3.2.0-72-generic-pae #107-Ubuntu SMP Thu Nov 6 14:44:10 UTC 2014 i686 i686 i386 GNU/Linux > Reporter: Manish > Fix For: 2.1.4 > > Attachments: Test1.java, cassandra.yaml > > > I am able to create SSTable for one file of 10M rows but not for other file. The data file which works is subscribers1.gz and data file which does not work is subscriber2.gz. Both files have same values in first column but different values for second column. I wonder why CQLSSTableLoader does not work for different set of data. > Program expected unzipped txt files. So please unzip files before running program. What I have observed is High GC when program processes around 5.2M lines of file subscriber2.gz. It is able to process till 5.8M lines with very frequent Full GC runs. It is not able to process beyond 5.8M rows because of memory not being available. > I have attached Test1.java and cassandra.yaml I used for creating sstable. In classpath I am specifying all jars of lib folder of extracted apache-cassandra-2.1.1-bin.tar.gz > Jira does not allow a file of size greater than 10 MB. So I am sharing data files in google drive. > link to download subscribers1.gz > https://drive.google.com/file/d/0B6_-ugKWlrfoOTRTa2FCNTFWU2c/view?usp=sharing > link to download subscribers2.gz > https://drive.google.com/file/d/0B6_-ugKWlrfocndycm9yM21rN0E/view?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)