accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Time based locality groups
Date Wed, 07 Mar 2012 23:22:36 GMT
We regularly have questions from users about querying new data and
aging off old data.  I was thinking about how we could better support
this in need in 1.5.  One thing that occurred to me is having locality
groups that were based on timestamp instead of column family.  For
example a locality group for each month.   Alternatively we could have
group for < day old, < week old, < month old, < year old.  Would need
a way for users to define these.

This would make scanning a table for recent data much faster.  Also
dropping old data could be made much faster by just dropping entire
locality groups at compaction time.

One thing that irks me about this is : Should column family and time
based locality groups be mutually exclusive (i.e. an RFile has one or
the other, not both)?  If they are not then order of which is
partitioned first is important for query performance and would
probably need to be user configurable.

Thoughts?

Keith

Mime
View raw message