Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 22047 invoked from network); 11 Jun 2010 17:51:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Jun 2010 17:51:19 -0000 Received: (qmail 27744 invoked by uid 500); 11 Jun 2010 17:51:18 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 27724 invoked by uid 500); 11 Jun 2010 17:51:18 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 27716 invoked by uid 99); 11 Jun 2010 17:51:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jun 2010 17:51:18 +0000 X-ASF-Spam-Status: No, hits=-1.1 required=10.0 tests=AWL,FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jun 2010 17:51:13 +0000 Received: by fxm13 with SMTP id 13so941380fxm.31 for ; Fri, 11 Jun 2010 10:50:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=IgG/5FWPGTOn78ZGJdm2F7ZHnGaiUg1vwB4OuWKZDQM=; b=FLiPf95MUKvcN0OwnQxwyLh0747qDQIezdX02+abiEIQCQFVQ391GgztddmqtfX0x2 uGKMJ56mh81FZRRo9t5en4jVL9FUpfYHNU3vClUGaFBXFqOokkm3E1Al6me3WxVjC+w4 TA+1igcujJrW1lI6Nf9TavnqICNyw6JnCZq08= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=ZLp9FGDxf9vbfybXv1b4vKKLY4udHJWx45QVSaCHxLYBesVN5oliNI/YqewIVOMYOm 1npd5qOGXRf205LjqAwyLph/kcU725i0fOn9xKgge1uCC5zioFnBpMxhfP2gtbgUOPYu dr9rlXKMPqXa8FhP6cMG3P0TVXiay6jQLVLNk= Received: by 10.216.153.140 with SMTP id f12mr1285806wek.72.1276278651559; Fri, 11 Jun 2010 10:50:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.17.197 with HTTP; Fri, 11 Jun 2010 10:50:30 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Fri, 11 Jun 2010 10:50:30 -0700 Message-ID: Subject: Re: read operation is slow To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable you need to look at cfstats to see what the latency is internal to cassandra, vs what your client is introducing then you should probably read the comments in the configuration file about caching On Fri, Jun 11, 2010 at 9:38 AM, Caribbean410 wrot= e: > > Thanks Riyad. > > Right now I am just testing Cassandra on single node. The server and clie= nt > are running on the same machine. I tried the read test again on two > machines, on one machine the cpu usage is around 30% most of the time and > another is 90%. > > Pelops is one way to access Cassandra, there are also other java client l= ike > hector and jassandra, will these java clients have significant different > performance? > > Also I once tried to change the storage configure file, like change > CommitLogDirectory and DataFileDirectory to different disks, change > DiskAccessMode to mmap for a 64bit machine, and change ConcurrentReads fr= om > 8 to 2. All of these do not change performance much. > > For other users who use different access client, like using php, c++, > python, etc, if you have any experience in boosting the read performance, > you are more than welcome to share with me. Thanks, > > On Fri, Jun 11, 2010 at 8:19 AM, Riyad Kalla wrote: >> >> Caribbean410, >> >> This comes up on the Redis list alot as well -- what you are actually >> measuring is the client sending a network connection to the Cas server a= nd >> it replying -- so the performance numbers you are getting can easily be = 70% >> network wait time and not necessarily hardcore read/write server >> performance. >> One way to see if this is the case, run your read test, then watch the C= PU >> on the server for the Cassandra process and see if it's pegging the CPU = -- >> if it's just sitting there banging between 0-10%, the you are spending m= ost >> of your time waiting on network i/o (open/close sockets, etc.) >> If you can parallelize your test to spawn say 5 threads that all do the >> same thing, see if the performance for each thread increases=A0linearly = -- >> which would indicate Cassandra is plenty fast in your setup, you just ne= ed >> to utilize more client threads over the network. >> That new Java library, Pelops by Dominic >> (http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-d= atabase-client-for-java/) >> has a nice intrinsic node-balancing design that could be handy IF you ar= e >> using multiple nodes. If you are just testing against 1 node, then spawn >> multiple threads of your code above and see how each thread's performanc= e >> scales. >> -R >> On Thu, Jun 10, 2010 at 2:39 PM, Caribbean410 >> wrote: >>> >>> Hello, >>> >>> I am testing the performance of cassandra. We write 200k records to >>> database and each record is 1k size. Then we read these 200k records. >>> It takes more than 400s to finish the read which is much slower than >>> mysql (20s around). I read some discussion online and someone suggest >>> to make multiple connections to make it faster. But I am not sure how >>> to do it, do I need to change my storage setting file or just change >>> the java client code? >>> >>> Here is my read code, >>> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Properties info =3D new Propert= ies(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 info.put(DriverManager.CONSISTE= NCY_LEVEL, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Consistency= Level.ONE.toString()); >>> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IConnection connection =3D Driv= erManager.getConnection( >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "thrift= ://localhost:9160", info); >>> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // 2. Get a KeySpace by nam= e >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IKeySpace keySpace =3D >>> connection.getKeySpace("Keyspace1"); >>> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // 3. Get a ColumnFamily by= name >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IColumnFamily cf =3D >>> keySpace.getColumnFamily("Standard2"); >>> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ByteArray nameFirst =3D Byt= eArray.ofASCII("first"); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ICriteria criteria =3D cf.c= reateCriteria(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 long readBytes =3D 0; >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 long start =3D System.curre= ntTimeMillis(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (int i =3D 0; i= < numOfRecords; i++) { >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int= n =3D random.nextInt(numOfRecords); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 userName =3D keySet[n]; >>> >>> criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, >>> nameFirst, 10); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 Map> map =3D >>> criteria.select(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 List list =3D >>> map.get(userName); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 ByteArray bloc =3D >>> list.get(0).getValue(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 byte[] byteArrayloc =3D >>> bloc.toByteArray(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 loc =3D new String(byteArrayloc); >>> //=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 System.out.println(userName+" >>> "+loc); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 readBytes =3D readBytes + >>> loc.length(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >>> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 long finish=3DSystem.cu= rrentTimeMillis(); >>> >>> I once commented these lines >>> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 ByteArray bloc =3D >>> list.get(0).getValue(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 byte[] byteArrayloc =3D >>> bloc.toByteArray(); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 loc =3D new String(byteArrayloc); >>> //=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 System.out.println(userName+" >>> "+loc); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 readBytes =3D readBytes + >>> loc.length(); >>> >>> And the performance doesn't improve much. >>> >>> Any suggestion is welcome. Thanks, > > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com