incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Massey <keith.mas...@digitalreasoning.com>
Subject Re: Scanning for rows using columnfamily only
Date Wed, 02 Nov 2011 20:28:38 GMT
On 11/1/11 3:11 PM, Keith Massey wrote:
> Thanks for the tips. We tried using one locality group per column family
> (I think there are 20-25). It has definitely sped up queries for all
> data in a single column family. The first batch comes back in about 5
> seconds rather than 120 seconds without the locality groups. Our data
> load time doubled though from 7 hours to 14 hours. I don't have any
> evidence at this point that it is related to the locality groups. But
> there were very few differences between the 7-hour load and the 14-hour
> load. Any thoughts about whether this could be a side effect of loading
> data into 25 locality groups? Or am I looking in the wrong place?
> Thanks again.
>
> Keith
Actually I might have spoken too soon. While many queries now come back 
in around 5 seconds that previously took more than 100, some still take 
a really long time. Specifically they seem to be queries for two column 
families that only appear in about 50 rows total (across billions in the 
table). I've lumped these two metadata-type column families into a 
single locality group. I've confirmed that they are recognized as being 
in a locality group. But if I "scan -c 
<column_family_that_is_in_this_locality_group>" in cloudbase shell, it 
takes hundreds of seconds to return all < 50 rows. Was this a bad use of 
locality groups? Should we just put this metadata into its own table? 
Thanks again.

Keith

Mime
View raw message