accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Accumulo GC and Hadoop trash settings
Date Mon, 17 Aug 2015 21:02:25 GMT
Some advanced recovery steps are documented[1], but there is no sort of 
"fix it for you" tool.

It's probably a good idea to either set "fs.trash.interval" and/or 
"fs.trash.checkpoint.interval" in core-site.xml to be representative of 
the available HDFS space you have, or just turn off trash and take the 
necessary steps to make sure your data is backed up (if that's a 
priority for you).

HDFS (and Accumulo for that matter) are only as reliable as the hardware 
and configurations you have set. They are built to be robust and 
reliable systems, but they aren't without their flaws given enough time.


James Hughes wrote:
> Ok, I can the see the benefit of being able to recovery data.  Is this
> process documented?  And is there any kind of user-friendly tool for it?
> On Mon, Aug 17, 2015 at 4:11 PM, <
> <>> wrote:
>       It's not temporary files, it's any file that has been compacted
>     away. If you keep files around longer than
>     {dfs.namenode.checkpoint.period}, then you have a chance to recover
>     in case your most recent checkpoint is corrupt.
>     ------------------------------------------------------------------------
>     *From: *"James Hughes" < <>>
>     *To: * <>
>     *Sent: *Monday, August 17, 2015 3:57:57 PM
>     *Subject: *Accumulo GC and Hadoop trash settings
>     Hi all,
>      From reading about the Accumulo GC, it sounds like temporary files
>     are routinely deleted during GC cycles.  In a small testing
>     environment, I've the HDFS Accumulo user's .Trash folder have 10s of
>     gigabytes of data.
>     Is there any reason that the default value for gc.trash.ignore is
>     false?  Is there any downside to deleting GC'ed files completely?
>     Thanks in advance,
>     Jim

View raw message