Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86107CB93 for ; Thu, 5 Jun 2014 16:16:41 +0000 (UTC) Received: (qmail 27082 invoked by uid 500); 5 Jun 2014 16:16:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 27038 invoked by uid 500); 5 Jun 2014 16:16:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 27030 invoked by uid 99); 5 Jun 2014 16:16:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jun 2014 16:16:38 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of SRS0=SVsdAK=3C=basetechnology.com=jack@yourhostingaccount.com designates 65.254.253.237 as permitted sender) Received: from [65.254.253.237] (HELO walmailout08.yourhostingaccount.com) (65.254.253.237) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jun 2014 16:16:31 +0000 Received: from mailscan09.yourhostingaccount.com ([10.1.15.9] helo=walmailscan09.yourhostingaccount.com) by walmailout08.yourhostingaccount.com with esmtp (Exim) id 1WsaKp-0001jL-FQ for user@cassandra.apache.org; Thu, 05 Jun 2014 12:16:11 -0400 Received: from impout02.yourhostingaccount.com ([10.1.55.2] helo=impout02.yourhostingaccount.com) by walmailscan09.yourhostingaccount.com with esmtp (Exim) id 1WsaKp-0008EV-Ct for user@cassandra.apache.org; Thu, 05 Jun 2014 12:16:11 -0400 Received: from walauthsmtp04.yourhostingaccount.com ([10.1.18.4]) by impout02.yourhostingaccount.com with NO UCE id AgGB1o00705G96J01gGBfR; Thu, 05 Jun 2014 12:16:11 -0400 X-Authority-Analysis: v=2.0 cv=aPZyWMBm c=1 sm=1 a=UkMH5KcvGpXfM81wB0t8ug==:17 a=aQzbgH187woA:10 a=0U3_UNebGJwA:10 a=3jZET7lWBKwA:10 a=jvYhGVW7AAAA:8 a=mV9VRH-2AAAA:8 a=99mQtb5M4mUaPdlrMiEA:9 a=mFyHDrcPJccA:10 a=Byx-y9mGAAAA:8 a=_W_S_7VecoQA:10 a=lSQsbIguPOwA:10 a=ZyCNx9LFiA0kwLx3ZJIN5w==:117 X-EN-OrigOutIP: 10.1.18.4 X-EN-IMPSID: AgGB1o00705G96J01gGBfR Received: from 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.28]:4263 helo=JackKrupansky14) by walauthsmtp04.yourhostingaccount.com with esmtpa (Exim) id 1WsaKp-0006mx-4x for user@cassandra.apache.org; Thu, 05 Jun 2014 12:16:11 -0400 Message-ID: From: "Jack Krupansky" To: References: <543849fc.9b8e.1466af381cf.Coremail.xu_zhong_xing@163.com> In-Reply-To: <543849fc.9b8e.1466af381cf.Coremail.xu_zhong_xing@163.com> Subject: Re: CQLSSTableWriter memory leak Date: Thu, 5 Jun 2014 12:16:13 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_1A11_01CF80B7.F16D5B00" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:931c98230c6409dcc37fa7e93b490c27 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.28 X-EN-OrigHost: 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ------=_NextPart_000_1A11_01CF80B7.F16D5B00 Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: quoted-printable How many rows (primary key values) are you writing for each partition of = the primary key? I mean, are there relatively few, or are these very = wide partitions? Oh, I see! You=A1=AFre writing 50,000,000 rows to a single partition! = My, that IS ambitious. -- Jack Krupansky From: Xu Zhongxing=20 Sent: Thursday, June 5, 2014 3:34 AM To: user@cassandra.apache.org=20 Subject: CQLSSTableWriter memory leak I am using Cassandra's CQLSSTableWriter to import a large amount of data = into Cassandra. When I use CQLSSTableWriter to write to a table with = compound primary key, the memory consumption keeps growing. The GC of = JVM cannot collect any used memory. When writing to tables with no = compound primary key, the JVM GC works fine. My Cassandra version is 2.0.5. The OS is Ubuntu 14.04 x86-64. JVM = parameters are -Xms1g -Xmx2g. This is sufficient for all other = non-compound primary key cases. The problem can be reproduced by the following test case: import org.apache.cassandra.io.sstable.CQLSSTableWriter; import org.apache.cassandra.exceptions.InvalidRequestException; import java.io.IOException; import java.util.UUID; class SS { public static void main(String[] args) { String schema =3D "create table test.t (x uuid, y uuid, primary = key (x, y))"; String insert =3D "insert into test.t (x, y) values (?, ?)"; CQLSSTableWriter writer =3D CQLSSTableWriter.builder() .inDirectory("/tmp/test/t") .forTable(schema).withBufferSizeInMB(32) .using(insert).build(); UUID id =3D UUID.randomUUID(); try { for (int i =3D 0; i < 50000000; i++) { UUID id2 =3D UUID.randomUUID(); writer.addRow(id, id2); } writer.close(); } catch (Exception e) { System.err.println("hell"); } } } ------=_NextPart_000_1A11_01CF80B7.F16D5B00 Content-Type: text/html; charset="GB2312" Content-Transfer-Encoding: quoted-printable
How many rows (primary key values) are you writing for each = partition of=20 the primary key? I mean, are there relatively few, or are these very = wide=20 partitions?
 
Oh, I see! You=A1=AFre writing 50,000,000 rows to a single = partition! My, that=20 IS ambitious.
 
-- Jack=20 Krupansky
 
Sent: Thursday, June 5, 2014 3:34 AM
Subject: CQLSSTableWriter memory leak
 
I=20 am using Cassandra's CQLSSTableWriter to import a large amount of data = into=20 Cassandra. When I use CQLSSTableWriter to write to a table with compound = primary=20 key, the memory consumption keeps growing. The GC of JVM cannot collect = any used=20 memory. When writing to tables with no compound primary key, the JVM GC = works=20 fine.

My=20 Cassandra version is 2.0.5. The OS is Ubuntu 14.04 x86-64. JVM = parameters are=20 -Xms1g -Xmx2g. This is sufficient for all other non-compound primary key = cases.

The=20 problem can be reproduced by the following test case:

import org.apache.cassandra.io.sstable.CQLSSTableWriter;
import org.apache.cassandra.exceptions.InvalidRequestException;

import java.io.IOException;
import java.util.UUID;

class SS {
    public static void main(String[] args) {
        String schema =3D "create table test.t (x uuid, y uuid, primary =
key (x, y))";


        String insert =3D "insert into test.t (x, y) values (?, ?)";
        CQLSSTableWriter writer =3D CQLSSTableWriter.builder()
            .inDirectory("/tmp/test/t")
            .forTable(schema).withBufferSizeInMB(32)
            .using(insert).build();

        UUID id =3D UUID.randomUUID();
        try {
            for (int i =3D 0; i < 50000000; i++) {
                UUID id2 =3D UUID.randomUUID();
                writer.addRow(id, id2);
            }

            writer.close();
        } catch (Exception e) {
            System.err.println("hell");
        }
    }
}
------=_NextPart_000_1A11_01CF80B7.F16D5B00--