Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E7634D817 for ; Sat, 11 Aug 2012 22:11:46 +0000 (UTC) Received: (qmail 40592 invoked by uid 500); 11 Aug 2012 22:11:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 40567 invoked by uid 500); 11 Aug 2012 22:11:44 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40559 invoked by uid 99); 11 Aug 2012 22:11:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Aug 2012 22:11:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tyler@datastax.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vc0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Aug 2012 22:11:40 +0000 Received: by vcbfo14 with SMTP id fo14so2807107vcb.31 for ; Sat, 11 Aug 2012 15:11:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=EDkiie0CnObqnjba9669MFHnCJPNTwvd6vR4qBt6EqM=; b=LKoFVVA4kJwhfRBuRxU/eBbENX+ycLs0w0SbUWP2AqxNQM8tm/PfkNqgP8x7BXcN3S 6JRXBNN7IEHxUwdM9ZihYth8L1S9GN6A0aWntTCP/3lOk705g+aM8O+Ke/hMM/tuTceJ NExm/Ue/+3qmofBsHQUOIfRgLsIzxOovsJrybAE8Cm0HGij8iwxafPEnYQwgERWlwtYO AG9vaG236bvkVz/LuzgfmbGNwdOTx1jpuBw35btqN8L9WasXeeUK4Tds6+29t3EnAvZn zJNnDfA7CcE8EYLKVuKmynL6IiB3qjVSc3ZDFCG4N+24ezM7BZYLLVFpj6+B9xwFo7hq Kmmg== MIME-Version: 1.0 Received: by 10.52.27.244 with SMTP id w20mr2238027vdg.67.1344723078868; Sat, 11 Aug 2012 15:11:18 -0700 (PDT) Received: by 10.58.172.72 with HTTP; Sat, 11 Aug 2012 15:11:18 -0700 (PDT) In-Reply-To: References: Date: Sat, 11 Aug 2012 17:11:18 -0500 Message-ID: Subject: Re: problem of inserting columns of a great amount From: Tyler Hobbs To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf307c9bd26feeab04c704bd93 X-Gm-Message-State: ALoCoQl/E3a1CAmlrq/8VqNZXkKbIK5DmM2So3cR6nRh5TaW6EiiW3Rr6464ZZNat19LgLDXRy0h X-Virus-Checked: Checked by ClamAV on apache.org --20cf307c9bd26feeab04c704bd93 Content-Type: text/plain; charset=ISO-8859-1 There is a fair amount of overhead in the Thrift structures for columns and mutations, so that's a pretty large mutation. In general, you'll see better performance inserting many small batch mutations in parallel. On Fri, Aug 10, 2012 at 2:04 AM, Jin Lei wrote: > Sorry, something is wrong with my previous problem description. The fact > is that the cassandra deny my requests when I try to insert 50k rows > (rather than 50k columns) into a column family at one time. Each row with 1 > column. > > 2012/8/10 Jin Lei > >> Hello everyone, >> I'm a novice to cassandra and meet a problem recently. >> I want to insert over 50k columns into cassandra at one time, total size >> of which doesn't exceed 16MB, but the database return an exception as >> follows. >> >> [E 120809 15:37:31 service:1251] error in write to database >> Traceback (most recent call last): >> File "/home/stoneiii/mycode/src/user/service.py", line 1248, in >> flush_mutator >> self.mutator.send() >> >> File "/home/stoneiii/mycode/pylib/pycassa/batch.py", line 127, in >> send >> >> conn.batch_mutate(mutations, write_consistency_level) >> File "/home/stoneiii/gaia2/pylib/pycassa/pool.py", line 145, in >> new_f >> return new_f(self, *args, **kwargs) >> File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in >> new_f >> return new_f(self, *args, **kwargs) >> File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in >> new_f >> return new_f(self, *args, **kwargs) >> File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in >> new_f >> return new_f(self, *args, **kwargs) >> File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in >> new_f >> return new_f(self, *args, **kwargs) >> File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 140, in >> new_f >> (self._retry_count, exc.__class__.__name__, exc)) >> MaximumRetryException: Retried 6 times. Last failure was error: >> [Errno 104] Connection reset by peer >> >> Since cassandra supports 2 billion of columns in one table, why can't I >> insert 50k columns in this way? Or what settings should I adjust to break >> this limit? >> Thanks for any hint in advance! >> >> >> >> > -- Tyler Hobbs DataStax --20cf307c9bd26feeab04c704bd93 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable There is a fair amount of overhead in the Thrift structures for columns and= mutations, so that's a pretty large mutation.

In general, you&#= 39;ll see better performance inserting many small batch mutations in parall= el.

On Fri, Aug 10, 2012 at 2:04 AM, Jin Lei <jehovah.love@gmail.com> wrote:
Sorry, something is wrong with my previous problem de= scription. The fact is that the cassandra deny my requests when I try to in= sert 50k rows (rather than 50k columns) into a column family at one time. E= ach row with 1 column.

2012/8/10 Jin Lei <jehovah.= love@gmail.com>
Hell= o everyone,
I'm a novice to = cassandra and meet a problem recently.=A0
I want to insert over 50k columns into cassandra at one time, total = size of which doesn't exceed 16MB, but the database return an exception= as follows.

[E 120809 15:37:31 service:1251] error in write= to database
=A0=A0=A0 Traceback (most r= ecent call last):
=A0=A0=A0=A0=A0 File = "/home/stoneiii/mycode/src/user/service.py", line 1248, in flush_= mutator
=A0=A0=A0=A0=A0=A0=A0 self.mutator.send()<= /span>

=
=A0=A0=A0=A0=A0 File &= quot;/home/stoneiii/mycode/pylib/pycassa/batch.py", line 127, in send<= /span>

=A0=A0=A0=A0=A0=A0=A0 conn.= batch_mutate(mutations, write_consistency_level)
= =A0=A0=A0=A0=A0 File "/home/stoneiii/gaia2/pylib/pycassa/pool.py"= , line 145, in new_f
=A0=A0=A0=A0=A0=A0=A0 retur= n new_f(self, *args, **kwargs)
=A0=A0=A0=A0= =A0 File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145,= in new_f
=A0=A0=A0=A0=A0=A0=A0 retur= n new_f(self, *args, **kwargs)
=A0=A0=A0=A0=A0 Fil= e "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in new= _f
=A0=A0=A0=A0=A0=A0=A0 retur= n new_f(self, *args, **kwargs)
=A0=A0=A0=A0=A0 Fil= e "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in new= _f
=A0=A0=A0=A0=A0=A0=A0 retur= n new_f(self, *args, **kwargs)
=A0=A0=A0=A0=A0 Fil= e "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in new= _f
=A0=A0=A0=A0=A0=A0=A0 retur= n new_f(self, *args, **kwargs)
=A0=A0=A0=A0=A0 Fil= e "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 140, in new= _f
=A0=A0=A0=A0=A0=A0=A0 (self= ._retry_count, exc.__class__.__name__, exc))
=A0=A0=A0 MaximumRetryException: Retried 6 times. Last failure was er= ror: [Errno 104] Connection reset by peer

Since cassandr= a supports 2 billion of columns in one table, why can't I insert 50k co= lumns in this way? Or what settings should I adjust to break this limit?
Thanks for any hint in advance!
=A0 =A0 =A0 =A0 =A0 =A0 =A0






--
Tyler Hobbs
DataStax
<= br> --20cf307c9bd26feeab04c704bd93--