Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 76489 invoked from network); 8 Jul 2010 21:28:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Jul 2010 21:28:17 -0000 Received: (qmail 19463 invoked by uid 500); 8 Jul 2010 21:28:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 19409 invoked by uid 500); 8 Jul 2010 21:28:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 19401 invoked by uid 99); 8 Jul 2010 21:28:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jul 2010 21:28:15 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.172] (HELO mail-px0-f172.google.com) (209.85.212.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jul 2010 21:28:05 +0000 Received: by pxi20 with SMTP id 20so631500pxi.31 for ; Thu, 08 Jul 2010 14:27:44 -0700 (PDT) Received: by 10.114.95.20 with SMTP id s20mr10123940wab.214.1278624464190; Thu, 08 Jul 2010 14:27:44 -0700 (PDT) Received: from [192.168.1.101] (24-205-19-98.dhcp.nrwl.ca.charter.com [24.205.19.98]) by mx.google.com with ESMTPS id c24sm1765692wam.19.2010.07.08.14.27.37 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 08 Jul 2010 14:27:38 -0700 (PDT) Message-ID: <4C3642C7.1080506@nutanix.com> Date: Thu, 08 Jul 2010 14:27:35 -0700 From: "Brent N. Chun" Reply-To: bnc@nutanix.com Organization: Nutanix Inc. User-Agent: Thunderbird 2.0.0.24 (X11/20100317) MIME-Version: 1.0 To: Thomas Heller CC: user@cassandra.apache.org, Brent Chun Subject: Re: Reading all rows in a column family in parallel References: <4C357C88.4030706@nutanix.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thomas Heller wrote: > Hey, > >> .... Is >> this possible in 0.6.0? (Note: for the next startToken, I was just planning >> on computing the MD5 digest of the last key directly since I'm accessing >> Cassandra through Thrift.) > > Can't speak for 0.6.0 but it works for 0.6.3. > > Just implemented this in ruby (minus the parallel part). > > Cheers, > /thomas Hm, I must be doing something fundamentally wrong then. I just tried 0.6.3, same result. In this example, I have a 1 node system and have 100 rows in a single CF. When trying to read it back using token-based range queries and a RandomPartitioner, I get the following below (only 33/100 rows returned). Now the 100 rows have keys that hash to random points on the ring. In the example below, I'm reading rows in chunks of 20. In the first range query, the initial range is the entire ring. The 20 rows returned have MD5 hashes in no particular order it seems and could be anywhere on the ring. Taking the MD5 hash of the last row's key, I start the second range query. In the second range query ( 292996472659622939455744264432842142924, 34571752641348786448680284622901156834 ], what's being returned below seems like exactly what it suggests: return rows in the above range of MD5 hashes. But some of the remaining 80 rows we want may be outside that range. Hence, only 33 rows below. If the rows were being returned in the token-based range queries were in in MD5 hash order (and handled wraps ideally), then it seems like this interface could work. But others seem to be using this functionality successfully, so that suggests this is somehow unnecessary. Can someone help me out here? Thanks, bnc -------------------------------------------------------------------------------- Scanning range 0 ( 34571752641348786448680284622901156834, 34571752641348786448680284622901156834 ] Scanning chunk ( 34571752641348786448680284622901156834, 34571752641348786448680284622901156834 ] in range 0 Read 20 rows Read row 0, token 336932469034906281211924193433194809371, key 0_my_key62 Read row 1, token 5919946189209861803345840641668714978, key G_my_key16 Read row 2, token 6676056754427192599913432294390467082, key N_my_key85 Read row 3, token 330974738873996707017206868970060026330, key 6_my_key6 Read row 4, token 9595097897929687061907189837471352784, key E_my_key14 Read row 5, token 16575788966172751729835323651471549632, key a_my_key98 Read row 6, token 20927090112620661198733690835293074593, key 5_my_key67 Read row 7, token 28411545431179372696834683157677733478, key B_my_key73 Read row 8, token 29636277939148773659952116897998650776, key Q_my_key26 Read row 9, token 31186550159320208451777665196866508345, key j_my_key45 Read row 10, token 309081729348188654502493750295907191249, key D_my_key75 Read row 11, token 308480936859450293438865473928962136114, key W_my_key32 Read row 12, token 33060929359846763792204741553927689627, key Q_my_key88 Read row 13, token 36834373239213294576855495985365240744, key D_my_key13 Read row 14, token 302818545694924710056493830778421143168, key C_my_key12 Read row 15, token 39723252966237722984897584840501933181, key I_my_key18 Read row 16, token 297899763604776667052026292305780186395, key 2_my_key2 Read row 17, token 45994786947573748381278100108617428931, key U_my_key92 Read row 18, token 294076607175826631726358986726954934589, key T_my_key29 Read row 19, token 292996472659622939455744264432842142924, key M_my_key84 Scanning chunk ( 292996472659622939455744264432842142924, 34571752641348786448680284622901156834 ] in range 0 Read 13 rows Read row 20, token 336932469034906281211924193433194809371, key 0_my_key62 Read row 21, token 5919946189209861803345840641668714978, key G_my_key16 Read row 22, token 6676056754427192599913432294390467082, key N_my_key85 Read row 23, token 330974738873996707017206868970060026330, key 6_my_key6 Read row 24, token 9595097897929687061907189837471352784, key E_my_key14 Read row 25, token 16575788966172751729835323651471549632, key a_my_key98 Read row 26, token 20927090112620661198733690835293074593, key 5_my_key67 Read row 27, token 28411545431179372696834683157677733478, key B_my_key73 Read row 28, token 29636277939148773659952116897998650776, key Q_my_key26 Read row 29, token 31186550159320208451777665196866508345, key j_my_key45 Read row 30, token 309081729348188654502493750295907191249, key D_my_key75 Read row 31, token 308480936859450293438865473928962136114, key W_my_key32 Read row 32, token 33060929359846763792204741553927689627, key Q_my_key88 Scanning chunk ( 33060929359846763792204741553927689627, 34571752641348786448680284622901156834 ] in range 0 Read 0 rows --------------------------------------------------------------------------------