accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-452) Generalize locality groups
Date Thu, 08 Mar 2012 19:47:57 GMT


Todd Lipcon commented on ACCUMULO-452:

bq. If they want to scan the last 6 months of data for example and the largest file overlaps
this time range but only 10% of the data in the file matches the range, then a lot of data
needs to be filtered. Does HBase do anything special to deal with case.

We have a setting for "max file size" beyond which a file won't be included in compactions.
Setting that to a few GB would be prudent in a case where most of your queries are time-bound.
Of course, there's an associated cost against scanners which aren't time-bound, as they'll
have to merge all files, but in some cases it's fine.

You can see more discussion about this in HBASE-4717
> Generalize locality groups
> --------------------------
>                 Key: ACCUMULO-452
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Keith Turner
>             Fix For: 1.5.0
>         Attachments: PartitionerDesign.txt
> Locality groups are a neat feature, but there is no reason to limit partitioning to column
families.  Data could be partitioned based on any criteria.  For example if a user is interested
in querying recent data and ageing off old data partitioning locality groups based in timestamp
would be useful.  This could be accomplished by letting users specify a partitioner plugin
that is used at compaction and scan time.  Scans would need an ability to pass options to
the partitioner.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message