incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Scanning for rows using columnfamily only
Date Tue, 01 Nov 2011 20:43:28 GMT
On Tue, Nov 1, 2011 at 4:11 PM, Keith Massey
<keith.massey@digitalreasoning.com> wrote:
> Thanks for the tips. We tried using one locality group per column family (I
> think there are 20-25). It has definitely sped up queries for all data in a
> single column family. The first batch comes back in about 5 seconds rather
> than 120 seconds without the locality groups. Our data load time doubled
> though from 7 hours to 14 hours. I don't have any evidence at this point
> that it is related to the locality groups. But there were very few
> differences between the 7-hour load and the 14-hour load. Any thoughts about
> whether this could be a side effect of loading data into 25 locality groups?
> Or am I looking in the wrong place?
> Thanks again.
>
> Keith
>

One issue could be that we do not segment the in memory map according
to locality groups.  This may not be the problem.  When we minor
compact, for each locality group we scan the entire in memory map and
write out the data for that locality group.  We have discussed
segmenting the in memory map per locality group. One drawback we
though of is that it would increase the insert cost in the case when a
mutation spans multiple locality groups.

Mime
View raw message