incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brent N. Chun" <...@nutanix.com>
Subject Re: Reading all rows in a column family in parallel
Date Thu, 08 Jul 2010 21:27:35 GMT
Thomas Heller wrote:
> Hey,
> 
>> .... Is
>> this possible in 0.6.0? (Note: for the next startToken, I was just planning
>> on computing the MD5 digest of the last key directly since I'm accessing
>> Cassandra through Thrift.)
> 
> Can't speak for 0.6.0 but it works for 0.6.3.
> 
> Just implemented this in ruby (minus the parallel part).
> 
> Cheers,
> /thomas

Hm, I must be doing something fundamentally wrong then. I just tried 0.6.3, same 
result. In this example, I have a 1 node system and have 100 rows in a single 
CF. When trying to read it back using token-based range queries and a 
RandomPartitioner, I get the following below (only 33/100 rows returned).

Now the 100 rows have keys that hash to random points on the ring. In the 
example below, I'm reading rows in chunks of 20.

In the first range query, the initial range is the entire ring. The 20 rows 
returned have MD5 hashes in no particular order it seems and could be anywhere 
on the ring. Taking the MD5 hash of the last row's key, I start the second range 
query.

In the second range query ( 292996472659622939455744264432842142924, 
34571752641348786448680284622901156834 ], what's being returned below seems like 
exactly what it suggests: return rows in the above range of MD5 hashes. But some 
of the remaining 80 rows we want may be outside that range. Hence, only 33 rows 
below.

If the rows were being returned in the token-based range queries were in in MD5 
hash order (and handled wraps ideally), then it seems like this interface could 
work. But others seem to be using this functionality successfully, so that 
suggests this is somehow unnecessary. Can someone help me out here?

Thanks,
bnc

--------------------------------------------------------------------------------

Scanning range 0 ( 34571752641348786448680284622901156834, 
34571752641348786448680284622901156834 ]
Scanning chunk ( 34571752641348786448680284622901156834, 
34571752641348786448680284622901156834 ] in range 0
Read 20 rows
Read row 0, token 336932469034906281211924193433194809371, key 0_my_key62
Read row 1, token 5919946189209861803345840641668714978, key G_my_key16
Read row 2, token 6676056754427192599913432294390467082, key N_my_key85
Read row 3, token 330974738873996707017206868970060026330, key 6_my_key6
Read row 4, token 9595097897929687061907189837471352784, key E_my_key14
Read row 5, token 16575788966172751729835323651471549632, key a_my_key98
Read row 6, token 20927090112620661198733690835293074593, key 5_my_key67
Read row 7, token 28411545431179372696834683157677733478, key B_my_key73
Read row 8, token 29636277939148773659952116897998650776, key Q_my_key26
Read row 9, token 31186550159320208451777665196866508345, key j_my_key45
Read row 10, token 309081729348188654502493750295907191249, key D_my_key75
Read row 11, token 308480936859450293438865473928962136114, key W_my_key32
Read row 12, token 33060929359846763792204741553927689627, key Q_my_key88
Read row 13, token 36834373239213294576855495985365240744, key D_my_key13
Read row 14, token 302818545694924710056493830778421143168, key C_my_key12
Read row 15, token 39723252966237722984897584840501933181, key I_my_key18
Read row 16, token 297899763604776667052026292305780186395, key 2_my_key2
Read row 17, token 45994786947573748381278100108617428931, key U_my_key92
Read row 18, token 294076607175826631726358986726954934589, key T_my_key29
Read row 19, token 292996472659622939455744264432842142924, key M_my_key84
Scanning chunk ( 292996472659622939455744264432842142924, 
34571752641348786448680284622901156834 ] in range 0
Read 13 rows
Read row 20, token 336932469034906281211924193433194809371, key 0_my_key62
Read row 21, token 5919946189209861803345840641668714978, key G_my_key16
Read row 22, token 6676056754427192599913432294390467082, key N_my_key85
Read row 23, token 330974738873996707017206868970060026330, key 6_my_key6
Read row 24, token 9595097897929687061907189837471352784, key E_my_key14
Read row 25, token 16575788966172751729835323651471549632, key a_my_key98
Read row 26, token 20927090112620661198733690835293074593, key 5_my_key67
Read row 27, token 28411545431179372696834683157677733478, key B_my_key73
Read row 28, token 29636277939148773659952116897998650776, key Q_my_key26
Read row 29, token 31186550159320208451777665196866508345, key j_my_key45
Read row 30, token 309081729348188654502493750295907191249, key D_my_key75
Read row 31, token 308480936859450293438865473928962136114, key W_my_key32
Read row 32, token 33060929359846763792204741553927689627, key Q_my_key88
Scanning chunk ( 33060929359846763792204741553927689627, 
34571752641348786448680284622901156834 ] in range 0
Read 0 rows

--------------------------------------------------------------------------------

Mime
View raw message