accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Cordova (Commented) (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-452) Generalize locality groups
Date Thu, 08 Mar 2012 21:13:56 GMT


Aaron Cordova commented on ACCUMULO-452:

Even if users use the server-provided timestamps or their own, the timestamp still falls after
the row and column, and is used the same way: to limit values after the rows and columns have
been identified. 

To me it seems as if this happened, as a little play:

BigTable Guys: look you can physically partition your data automatically using the rows!

Users: Great! That works, but maybe I want an additional, secondary partitioning?

BG: hmm. ok, how about you can also partition on the column family? It's the next item in
the hierarchy, doesn't add too much complexity, pretty straightforward. Just specify them
into groups called locality groups and I think we can keep this under control.

Users: Yay! You guys rock!

BG: You're welcome.

Other users: Hey, locality groups are cool, but can I partition on column qualifiers?

BG: why are rows and column families insufficient?

OU: well, I don't know, I just really like to slice things every way possible.

BG: sigh ..

Yet other users: Wait, what about timestamps? You know what's more general than partitioning
on a few elements of the data model? Partitioning on ALL the elements of the data model! So
sweet. More general means more better! 

BG: I'm quitting to go work at Facebook.

> Generalize locality groups
> --------------------------
>                 Key: ACCUMULO-452
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Keith Turner
>             Fix For: 1.5.0
>         Attachments: PartitionerDesign.txt
> Locality groups are a neat feature, but there is no reason to limit partitioning to column
families.  Data could be partitioned based on any criteria.  For example if a user is interested
in querying recent data and ageing off old data partitioning locality groups based in timestamp
would be useful.  This could be accomplished by letting users specify a partitioner plugin
that is used at compaction and scan time.  Scans would need an ability to pass options to
the partitioner.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message