Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 98743 invoked from network); 18 Jun 2010 08:03:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Jun 2010 08:03:51 -0000 Received: (qmail 38040 invoked by uid 500); 18 Jun 2010 08:03:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 37716 invoked by uid 500); 18 Jun 2010 08:03:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 37705 invoked by uid 99); 18 Jun 2010 08:03:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jun 2010 08:03:45 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_HELO_PASS,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcdcu-cassandra-user-1@m.gmane.org designates 80.91.229.12 as permitted sender) Received: from [80.91.229.12] (HELO lo.gmane.org) (80.91.229.12) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jun 2010 08:03:37 +0000 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OPWXi-0005jw-Ij for user@cassandra.apache.org; Fri, 18 Jun 2010 10:03:14 +0200 Received: from rev-89-111-19-52.deac.net ([rev-89-111-19-52.deac.net]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 18 Jun 2010 10:03:14 +0200 Received: from oleganas by rev-89-111-19-52.deac.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 18 Jun 2010 10:03:14 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@cassandra.apache.org connect(): No such file or directory From: Oleg Anastasjev Subject: Re: Load balancing Date: Fri, 18 Jun 2010 08:03:06 +0000 (UTC) Lines: 43 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 89.111.19.52 (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3) X-Virus-Checked: Checked by ClamAV on apache.org Mubarak Seyed apple.com> writes: > > - How does client (application) connect to cassandra cluster? Is it always for one node (and thrift can get ring info) and send the request to connected node This depends on client library you use. Any cassandra node can accept client connections and forward request to node owning requested data. > - If we send 300k records from each node, it is a over kill for a node which accepts client connection, does > node get choked? Of course in your situation no single node can handle all load. So you have to connect to several nodes. The best way, I believe, is to connect right to the node, owning data you need. Take a look to org/apache/cassandra/client/RingCache.java for an example how to read ring state and forward requests to right node. > - How do we design a cassandra cluster to make sure that insert get distributed to more than one nodes? > - If i prefer OrderPreservingPartition as a partitioner, how does single node handle all the 200k records? If you prefer OPP, you have 2 ways (manual and automatic): 1. If you know distribution of keys in your data, you distribute token values between you nodes in a way, which ensures unform key distribution. Imagine, if you have single byte keys ranging from 0 to 255 and 64 nodes (i assume data is distributed uniformly across all keys for simplicity). For this you'll have to manually configure in storage-conf of 1st node to 0, 2nd = 4, 3rd = 8, 4th=12 and so on. 2. The automatic way is to start cassandra cluster with small node count, import data to it and bootstrap rest of nodes, specifying bootstrap=true and empty value for token in storage conf. This way cassandra will try to balance data by itself. 200k of records are not big deal for cassandra, IMHO, but of course this depends on your hardware and size of records. Anyway, good idea is to test your configuration with real data first.