Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 51056958D for ; Wed, 25 Apr 2012 14:46:03 +0000 (UTC) Received: (qmail 35319 invoked by uid 500); 25 Apr 2012 14:46:00 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 35290 invoked by uid 500); 25 Apr 2012 14:46:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 35282 invoked by uid 99); 25 Apr 2012 14:46:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 14:46:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ares.tang@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 14:45:54 +0000 Received: by iazz13 with SMTP id z13so202718iaz.31 for ; Wed, 25 Apr 2012 07:45:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=RMVhZerPcVH9P1Jdo0vpUrVX6yKltOnkyRBtwX8OGt4=; b=DDMo5SgP6mA/utTU54M207a/IkgaAvxymqS19CynbGEv0ypsgsFHJhRx+XyBLZeys8 xVSIm8IZ9Jt03pUxFscgCYZow14SQdTOqlQlAS1MLpOuKUX5winn9XxqaIvhyrYiibmd lv/z/3kmRSdf9CS5HRRqvuDYNch3yMmq9LIlgwULoLHEqxz6jIvaexipP+/9JAW/lpmZ M3IIy9xHl0rdtILrKI+Pi+jJBPkutjIooHtOwFy8eyaJ8ODzN5DRFMrx776eL1aue7rE uJ/Tx7FUIpqzU8q1B/yunNphWWq9+bS19VF6jHapgHQG30JeWSxksYUUGEebe07FSx7F +j7w== MIME-Version: 1.0 Received: by 10.50.46.195 with SMTP id x3mr2888397igm.54.1335365133326; Wed, 25 Apr 2012 07:45:33 -0700 (PDT) Received: by 10.50.15.229 with HTTP; Wed, 25 Apr 2012 07:45:33 -0700 (PDT) In-Reply-To: References: Date: Wed, 25 Apr 2012 22:45:33 +0800 Message-ID: Subject: Re: Cassandra search performance From: Jason Tang To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=14dae9340e576ad92204be81eca1 X-Virus-Checked: Checked by ClamAV on apache.org --14dae9340e576ad92204be81eca1 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable 1.0.8 =D4=DA 2012=C4=EA4=D4=C225=C8=D5 =CF=C2=CE=E710:38=A3=ACPhilip Shon =D0=B4=B5=C0=A3=BA > what version of cassandra are you using. I found a big performance hit > when querying on the secondary index. > > I came across this bug in versions prior to 1.1 > > https://issues.apache.org/jira/browse/CASSANDRA-3545 > > Hope that helps. > > 2012/4/25 Jason Tang > >> And I found, if I only have the search condition "status", it only scan >> 200 records. >> >> But if I combine another condition "partition" then it scan all records >> because "partition" condition match all records. >> >> But combine with other condition such as "userName", even all "userName" >> is same in the 1,000,000 records, it only scan 200 records. >> >> So it impacted by scan execution plan, if we have several search >> conditions, how it works? Do we have the similar execution plan in >> Cassandra? >> >> >> =D4=DA 2012=C4=EA4=D4=C225=C8=D5 =CF=C2=CE=E79:18=A3=ACJason Tang =D0=B4=B5=C0=A3=BA >> >> Hi >>> >>> We have the such CF, and use secondary index to search for simple >>> data "status", and among 1,000,000 row records, we have 200 records wit= h >>> status we want. >>> >>> But when we start to search, the performance is very poor, and check >>> with the command "./bin/nodetool -h localhost -p 8199 cfstats" , Cassan= dra >>> read 1,000,000 records, and "Read Latency" is 0.2 ms, so totally it use= d >>> 200 seconds. >>> >>> It use lots of CPU, and check the stack, all thread in Cassandra is >>> read from socket. >>> >>> So I wonder, how to really use index to find the 200 records instead >>> of scan all rows. (Supper Column?) >>> >>> *ColumnFamily: queue* >>> * Key Validation Class: org.apache.cassandra.db.marshal.BytesType* >>> * Default column value validator: >>> org.apache.cassandra.db.marshal.BytesType* >>> * Columns sorted by: org.apache.cassandra.db.marshal.BytesType* >>> * Row cache size / save period in seconds / keys to save : >>> 0.0/0/all* >>> * Row Cache Provider: >>> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider* >>> * Key cache size / save period in seconds: 0.0/0* >>> * GC grace seconds: 0* >>> * Compaction min/max thresholds: 4/32* >>> * Read repair chance: 0.0* >>> * Replicate on write: false* >>> * Bloom Filter FP chance: default* >>> * Built indexes: [queue.idxStatus]* >>> * Column Metadata:* >>> * Column Name: status (737461747573)* >>> * Validation Class: org.apache.cassandra.db.marshal.AsciiType* >>> * Index Name: idxStatus* >>> * Index Type: KEYS* >>> * >>> * >>> BRs >>> //Jason >>> >> >> > --14dae9340e576ad92204be81eca1 Content-Type: text/html; charset=GB2312 Content-Transfer-Encoding: quoted-printable
1.0.8

=D4=DA 2= 012=C4=EA4=D4=C225=C8=D5 =CF=C2=CE=E710:38=A3=ACPhilip Shon <philip.sh= on@gmail.com>=D0=B4=B5=C0=A3=BA
what version of cassandra are you using.  I= found a big performance hit when querying on the secondary index.

I came across = this bug in versions prior to 1.1


Hope = that helps.

2012/4/25 Jason Tang <ares.tang@= gmail.com>
And I found, if I= only have the search condition "status", it only scan 200 record= s.

But if I co= mbine another condition "partition" then it scan all records beca= use "partition" condition match all records.

But combine= with other condition such as "userName", even all "userName= " is same in the 1,000,000 records, it only scan 200 records.

So it impacted by scan execution plan,= if we have several search conditions, how it works? Do we have the similar= execution plan in Cassandra?


=D4=DA 2012=C4=EA4=D4=C225=C8=D5 =CF=C2=CE=E79:18=A3=ACJason Tang <ares.= tang@gmail.com>=D0=B4=B5=C0=A3=BA

Hi

   We have the such CF, and use secondary i= ndex to search for simple data "status", and among 1,000,000 row = records, we have 200 records with status we want.

=   But when we start to search, the performance is very poor, and check= with the command "./bin/nodetool -h localhost -p 8199 cfstats" ,= Cassandra read 1,000,000 records, and "Read Latency" is 0.2 ms, = so totally it used 200 seconds.

  It use lots of CPU, and check the stack, all thr= ead in Cassandra is read from socket.

  So I = wonder, how to really use index to find the 200 records instead of scan all= rows. (Supper Column?)

ColumnFamily: queue
  &nb= sp;   Key Validation Class: org.apache.cassandra.db.marshal.BytesType<= /i>
      Default column value validator: org.a= pache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.mar= shal.BytesType
      Row cache size / save = period in seconds / keys to save : 0.0/0/all
   =   Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHash= CacheProvider
      Key cache size / save period in seconds: 0.0/0=
      GC grace seconds: 0
      Compaction min/max thresholds: 4/32
=       Read repair chance: 0.0
      Replicate on write: false
  &= nbsp;   Bloom Filter FP chance: default
   =   Built indexes: [queue.idxStatus]
    &nb= sp; Column Metadata:
       = ; Column Name: status (737461747573)
          Validation Class: org.apache.cas= sandra.db.marshal.AsciiType
        &n= bsp; Index Name: idxStatus
        &nb= sp; Index Type: KEYS

BRs
//Jason



--14dae9340e576ad92204be81eca1--