accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4021) bulk imports slow file garbage collection
Date Tue, 20 Oct 2015 19:19:27 GMT


Eric Newton commented on ACCUMULO-4021:

GC becomes inefficient when bulk importing.

GC grabs many files as candidates: up to an amount limited by memory.
GC then verifies no tablet is using those files: a full metadata table scan.
GC repeats with any remaining candidates.

However, the bulk loader is adding delete markers, which may appear at the end of the candidate

The result is that only a few files are checked against the metadata table, which takes some
time. Time enough for a few more markers to be added to the end of the delete section.

> bulk imports slow file garbage collection
> -----------------------------------------
>                 Key: ACCUMULO-4021
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: gc
>    Affects Versions: 1.6.3
>         Environment: large production system
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.6.5, 1.7.1, 1.8.0
> On a large system, bulk imports slow file garbage collection to a crawl.  The total number
of files to be deleted was about 14 million. Initially, it would run quickly, but then slow
down, to the point where only a few files would be deleted every few minutes. The jvm was
only using 50% of the CPU (and therefore, probably not GC thrashing). JStacks showed the collector
scanning the metadata table to remove referenced files from the delete list.
> If the bulk ingest requests were stopped, the GC completed quickly.

This message was sent by Atlassian JIRA

View raw message