incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Scanning for rows using columnfamily only
Date Thu, 03 Nov 2011 16:06:34 GMT
On Wed, Nov 2, 2011 at 4:28 PM, Keith Massey
<keith.massey@digitalreasoning.com> wrote:
> On 11/1/11 3:11 PM, Keith Massey wrote:
>>
>> Thanks for the tips. We tried using one locality group per column family
>> (I think there are 20-25). It has definitely sped up queries for all
>> data in a single column family. The first batch comes back in about 5
>> seconds rather than 120 seconds without the locality groups. Our data
>> load time doubled though from 7 hours to 14 hours. I don't have any
>> evidence at this point that it is related to the locality groups. But
>> there were very few differences between the 7-hour load and the 14-hour
>> load. Any thoughts about whether this could be a side effect of loading
>> data into 25 locality groups? Or am I looking in the wrong place?
>> Thanks again.
>>
>> Keith
>
> Actually I might have spoken too soon. While many queries now come back in
> around 5 seconds that previously took more than 100, some still take a
> really long time. Specifically they seem to be queries for two column
> families that only appear in about 50 rows total (across billions in the
> table). I've lumped these two metadata-type column families into a single
> locality group. I've confirmed that they are recognized as being in a
> locality group. But if I "scan -c
> <column_family_that_is_in_this_locality_group>" in cloudbase shell, it takes
> hundreds of seconds to return all < 50 rows. Was this a bad use of locality
> groups? Should we just put this metadata into its own table? Thanks again.
>
> Keith
>

One other thing to consider.  Since the in memory map is not
partitioned by locality group, everything in memory would need to be
scanned for this case.  You can look at the monitor page and see if
the table has entries in memory.  If so, you can flush the table from
the shell.  When the entries in memory goes to zero on the monitor
page, try the scan again.

Mime
View raw message