accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-1281) flush the METADATA table after GC
Date Tue, 16 Apr 2013 23:51:15 GMT


Eric Newton commented on ACCUMULO-1281:

The gc runs only every 15 minutes by default.  We've had users flush their !METADATA table
as often as every 5 minutes.  I have seen hundreds of thousands of files removed during a
GC cycle.  Unless this is compacted, they just keep building up in memory.

If the cluster is using live ingest, the !METADATA table tends to flush because of the number
of WAL entries it has.

The use of in-memory compactions (ACCUMULO-519) using a ratio of delete or update records
to trigger a flush would be a more intelligent approach, but that takes more than 4 lines
of code.
> flush the METADATA table after GC
> ---------------------------------
>                 Key: ACCUMULO-1281
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: gc
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Trivial
>             Fix For: 1.5.0
> The METADATA table is often small, with many in-memory writes.  Because it is small,
it does not normally get flushed, which will prune data with the versioning/delete iterators.
 Over time, the many in-memory versions can cause poor performance.
> The file garbage collector (gc) will make lots of updates as it runs.  That would be
a perfect time to flush the table and prune the versions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message