accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Time based locality groups
Date Wed, 07 Mar 2012 23:54:40 GMT
Replying to myself :)

The more I think about this, it seems that locality groups could
handled by plugins that can parition the data and select locality
groups in any way it likes. Want locality groups based on row suffix,
go ahead and write the plugin.

The plugin would be used for compaction time partitioning and scan
time locality group selection.   User could pass options to the
locality group plugin at scan time just like options are passed to
iterators.    Maybe this is an extension or further generalization of
the existing iterator framework, I have not thought through that far
enough.

Keith

On Wed, Mar 7, 2012 at 6:22 PM, Keith Turner <keith@deenlo.com> wrote:
> We regularly have questions from users about querying new data and
> aging off old data.  I was thinking about how we could better support
> this in need in 1.5.  One thing that occurred to me is having locality
> groups that were based on timestamp instead of column family.  For
> example a locality group for each month.   Alternatively we could have
> group for < day old, < week old, < month old, < year old.  Would need
> a way for users to define these.
>
> This would make scanning a table for recent data much faster.  Also
> dropping old data could be made much faster by just dropping entire
> locality groups at compaction time.
>
> One thing that irks me about this is : Should column family and time
> based locality groups be mutually exclusive (i.e. an RFile has one or
> the other, not both)?  If they are not then order of which is
> partitioned first is important for query performance and would
> probably need to be user configurable.
>
> Thoughts?
>
> Keith

Mime
View raw message