accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-1802) Create a compaction strategy for aging off data
Date Wed, 23 Oct 2013 18:31:43 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Keith Turner updated ACCUMULO-1802:
-----------------------------------

    Description: 
The default compaction strategy has a tendency to put the oldest data in the largest files.
 This leads to a lot of work when it is time to age off data.

One could imaging a compaction strategy that would split data into separate files based on
the timestamp.  Additionally, if the min/max timestamps for a file were known, old data could
be aged off by deleting whole files.

To accomplish this, will need to augment the configurable compaction strategy to support multiple
output files, and saving/using extra metadata in each file.

  was:
The default compaction strategy has a tendency to put the oldest data in the largest files.
 This leads to a lot of work when it is time to age off data.

One could imaging a compaction strategy that would split data into separate files based on
the timestamp.  Additionally, if the min/max timestamps for a file were known, old data could
be aged off by deleting whole files.

Augment the configurable compaction strategy to support multiple output files, and saving/using
extra metadata in each file.


> Create a compaction strategy for aging off data
> -----------------------------------------------
>
>                 Key: ACCUMULO-1802
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1802
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: tserver
>            Reporter: Eric Newton
>             Fix For: 1.7.0
>
>
> The default compaction strategy has a tendency to put the oldest data in the largest
files.  This leads to a lot of work when it is time to age off data.
> One could imaging a compaction strategy that would split data into separate files based
on the timestamp.  Additionally, if the min/max timestamps for a file were known, old data
could be aged off by deleting whole files.
> To accomplish this, will need to augment the configurable compaction strategy to support
multiple output files, and saving/using extra metadata in each file.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message