Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6590D200BB9 for ; Mon, 7 Nov 2016 17:58:39 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 642E1160AEC; Mon, 7 Nov 2016 16:58:39 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5C151160AE0 for ; Mon, 7 Nov 2016 17:58:38 +0100 (CET) Received: (qmail 15966 invoked by uid 500); 7 Nov 2016 16:58:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15955 invoked by uid 99); 7 Nov 2016 16:58:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Nov 2016 16:58:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 891CCC8636 for ; Mon, 7 Nov 2016 16:58:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.978 X-Spam-Level: * X-Spam-Status: No, score=1.978 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id zdjqrVWJlVeE for ; Mon, 7 Nov 2016 16:58:30 +0000 (UTC) Received: from mail1.bemta3.messagelabs.com (mail1.bemta3.messagelabs.com [195.245.230.172]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 315975F1C2 for ; Mon, 7 Nov 2016 16:58:30 +0000 (UTC) Received: from [85.158.137.68] by server-12.bemta-3.messagelabs.com id 34/04-28947-372B0285; Mon, 07 Nov 2016 16:57:23 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrPIsWRWlGSWpSXmKPExsVyyOvRCt3iTQo RBheOCFuce/eP2YHR49u1b+wBjFGsmXlJ+RUJrBnHvvQyF+yNqmjoV2lg/OffxcjFISRwilFi 4dP37BDOXkaJVafusHQxcnKwCdhIHNuzlQnEFhGwljiw8AFYXFjAU2LCj9lAcQ6guL/Egne2E CV6EicaHoOFWQRUJPbNZQMJ8wr4Sdy5dpwdxGYUkJX40riaGcRmFhCXaPqykhXElhAQkFiy5z wzhC0q8fLxP6i4rsTzCZ9ZIerzJDq+tEDNFJQ4OfMJ2DVCAjoS9+9uYp3AKDgLydhZSFpmIWm BiOtJ3Jg6hQ3C1pZYtvA1M4StKzHj3yEWZPEFjOyrGNWLU4vKUot0jfWSijLTM0pyEzNzdA0N jPVyU4uLE9NTcxKTivWS83M3MQIjggEIdjA2f3E6xCjJwaQkyvtilkKEEF9SfkplRmJxRnxRa U5q8SFGGQ4OJQne6I1AOcGi1PTUirTMHGBswqQlOHiURHjjQNK8xQWJucWZ6RCpU4yKUuK8k0 ESAiCJjNI8uDZYOrjEKCslzMsIdIgQT0FqUW5mCar8K0ZxDkYlYd7DIFN4MvNK4Ka/AlrMBLS 4KgZscUkiQkqqgdGc7cBGsz+/j7wPrF1RcChgoW/8xt3BdZNWb7+r8F9uaXk/Y8TerM51S1+v 9z/BPTNtvr5f3zxj+91a30/p16wJvxP3a7rNdmGB7eq5CybNuKBR0p9n/eLDU9f70hoCvQsen X7wxlHr0rpJL18W/VbXXCn4zGHXN6cPhgZTZhi2XF/c4zApce1VJZbijERDLeai4kQAg87wZQ IDAAA= X-Env-Sender: Rajesh.Radhakrishnan@phe.gov.uk X-Msg-Ref: server-2.tower-31.messagelabs.com!1478537843!57974562!1 X-Originating-IP: [194.74.226.168] X-StarScan-Received: X-StarScan-Version: 8.84; banners=phe.gov.uk,-,- X-VirusChecked: Checked Received: (qmail 63962 invoked from network); 7 Nov 2016 16:57:23 -0000 Received: from mail3.hpa.org.uk (HELO MAILHUBCOL02.phe.gov.uk) (194.74.226.168) by server-2.tower-31.messagelabs.com with AES256-SHA encrypted SMTP; 7 Nov 2016 16:57:23 -0000 Received: from MAILHUBCOL03.phe.gov.uk (2002:9e77:4223::9e77:4223) by MAILHUBCOL02.phe.gov.uk (2002:9e77:4236::9e77:4236) with Microsoft SMTP Server (TLS) id 14.3.279.2; Mon, 7 Nov 2016 16:57:23 +0000 Received: from MAILMBXCOL02.phe.gov.uk ([fe80::f128:b07d:46ab:bfa]) by MAILHUBCOL03.phe.gov.uk ([fe80::4957:65df:b95c:109f%12]) with mapi id 14.03.0279.002; Mon, 7 Nov 2016 16:51:05 +0000 From: Rajesh Radhakrishnan To: "user@cassandra.apache.org" Subject: Cassandra Python Driver : execute_async consumes lots of memory? Thread-Topic: Cassandra Python Driver : execute_async consumes lots of memory? Thread-Index: AdI5FpkLYmrwpD0BTzakZc++bSaXnw== Date: Mon, 7 Nov 2016 16:51:05 +0000 Message-ID: <0A9C05DECDEB6A4FAF7A3783EECB7800686ED292@MAILMBXCOL02.phe.gov.uk> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [2002:9e77:4116::9e77:4116] Content-Type: multipart/alternative; boundary="_000_0A9C05DECDEB6A4FAF7A3783EECB7800686ED292MAILMBXCOL02phe_" MIME-Version: 1.0 archived-at: Mon, 07 Nov 2016 16:58:39 -0000 --_000_0A9C05DECDEB6A4FAF7A3783EECB7800686ED292MAILMBXCOL02phe_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi We are trying to inject millions to data into a table by executing Batches= of PreparedStatments. We found that when we use 'session.execute(batch)', it write more data but= very very slow. However if we use 'session.execute_async(batch)' then its relatively fast= but when it reaches certain limit, its fillup the memory (python process)= Our implementation: Cassandra 3.7.0 cluster ring with 3 nodes (RedHat, 150GB Disk, 8GB of RAM= each) Python 2.7.12 Anybody know how to reduce the memory use of Cassandra-python driver API s= pecifically for execute_async? Thank you! =3D=3D=3DCODE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D sqlQuery =3D "INSERT INTO tableV (id, sample_name, pos, ref_base, v= ar_base) values (?,?,?,?,?)" random_numbers_for_strains =3D random.sample(xrange(1,300), 200) random_numbers =3D random.sample(xrange(1,2000000), 200000) totalCounter =3D 0 c =3D 0 time_init =3D time.time() for random_number_strain in random_numbers_for_strains: sample_name =3D None sample_name =3D 'sample'+str(random_number_strain) cassandraCluster =3D CassandraCluster.CassandraCluster() cluster =3D cassandraCluster.create_cluster_with_protocol2() session =3D cluster.connect(); #session.default_timeout =3D 1800 session.set_keyspace(self.KEYSPACE_NAME) preparedStatement =3D session.prepare(sqlQuery) counter =3D 0 c =3D c + 1 for random_number in random_numbers: totalCounter +=3D 1 if counter =3D=3D 0 : batch =3D BatchStatement() counter +=3D 1 if totalCounter % 10000 =3D=3D 0 : print "Total Count "+ str(totalCounter) batch.add(preparedStatement.bind([ uuid.uuid1(), sample_na= me, random_number, random.choice('GT'), random.choice('AC')])) if counter % 50 =3D=3D 0: session.execute_async(batch) #session.execute(batch) batch =3D None del batch counter =3D 0 time.sleep(2); session.cluster.shutdown() random_number=3D None del random_number preparedStatement =3D None session =3D None del session cluster =3D None del cluster cassandraCluster =3D None del cassandraCluster gc.collect() =3D=3D=3DCODE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Kind regards, Rajesh Radhakrishnan **************************************************************************= The information contained in the EMail and any attachments is confidential= and intended solely and for the attention and use of the named addressee(= s). It may not be disclosed to any other person without the express author= ity of Public Health England, or the intended recipient, or both. If you a= re not the intended recipient, you must not disclose, copy, distribute or = retain this message or any part of it. This footnote also confirms that th= is EMail has been swept for computer viruses by Symantec.Cloud, but please= re-sweep any attachments before opening or saving. http://www.gov.uk/PHE **************************************************************************= --_000_0A9C05DECDEB6A4FAF7A3783EECB7800686ED292MAILMBXCOL02phe_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi

We are trying to inject millions to data into a table by executing = Batches of PreparedStatments.

We found that when we use 'session.execute(batch)', it write more data but= very very slow.
However if we use  'session.execute_async(batch)' then its relatively= fast but when=20it reaches certain limit, its fillup the memory (python p= rocess)

Our implementation:
Cassandra 3.7.0 cluster  ring with 3 nodes (RedHat, 150GB Disk, 8GB o= f RAM each)

Python 2.7.12

Anybody know how to reduce the memory use of Cassandra-python driver API s= pecifically for execute_async? Thank you!



=3D=3D=3DCODE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
      sqlQuery =3D "INSERT INTO tableV = (id, sample_name, pos, ref_base, var_base) values (?,?,?,?,?)"
       random_numbers_for_strains =3D random.sample(xr= ange(1,300), 200)
        random_numbers =3D random.sampl= e(xrange(1,2000000), 200000)
       
        totalCounter  =3D 0
        c =3D 0
        time_init =3D time.time()
        for random_number_strain in ran= dom_numbers_for_strains:
           
            sample_= name =3D None
            sample_= name =3D 'sample'+str(random_number_strain)
           
            cassand= raCluster =3D CassandraCluster.CassandraCluster()
            cluster= =3D cassandraCluster.create_cluster_with_protocol2()
            session= =3D cluster.connect();
            #sessio= n.default_timeout =3D 1800
            session= .set_keyspace(self.KEYSPACE_NAME)
           
            prepare= dStatement =3D session.prepare(sqlQuery)
           
            counter= =3D 0
            c =3D c= + 1
           
            for ran= dom_number in random_numbers:

            &n= bsp;   totalCounter +=3D 1
            &n= bsp;   if counter =3D=3D 0 :
            &n= bsp;       batch =3D BatchStatement()

            &n= bsp;   counter +=3D 1
            &n= bsp;   if totalCounter % 10000 =3D=3D 0 :
            &n= bsp;       print "Total Count "= 3; str(totalCounter)

            &n= bsp;   batch.add(preparedStatement.bind([ uuid.uuid1(), sample_n= ame, random_number, random.choice('GT'), random.choice('AC')]))
            &n= bsp;   if counter % 50 =3D=3D 0:
            &n= bsp;       session.execute_async(batch)
            &n= bsp;       #session.execute(batch)
            &n= bsp;       batch =3D None
            &n= bsp;       del batch
            &n= bsp;       counter =3D 0
           
            time.sl= eep(2);
            session= .cluster.shutdown()
            random_= number=3D None
            del ran= dom_number
            prepare= dStatement =3D None
            session= =3D None
            del ses= sion
            cluster= =3D None
            del clu= ster
            cassand= raCluster =3D None
            del cas= sandraCluster
            gc.coll= ect()          
           
=3D=3D= =3DCODE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D



Kind regards,
Rajesh Radhakrishnan
<= font size=3D"2">


**************************************************************************=
The information contained in the EMail and any attachments is confidential= and intended solely and for the attention and use of the named addressee(= s). It may not be disclosed to any other person without the express author= ity of Public Health England, or the intended recipient, or both. If you a= re not the intended recipient, you must not disclose, copy, distribute or = retain this message or any part of it. This footnote also confirms that th= is EMail has been swept for computer viruses by Symantec.Cloud, but please= re-sweep any attachments before opening or saving. http://www.gov.uk/PHE<= BR> **************************************************************************=
--_000_0A9C05DECDEB6A4FAF7A3783EECB7800686ED292MAILMBXCOL02phe_--