accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject RE: Accumulo GC and Hadoop trash settings
Date Mon, 17 Aug 2015 23:22:07 GMT
  All of the components that you need to perform point in time recovery of an Accumulo instance
exist already. I have been working on a tool[1] in my copious amounts of free time to integrate
them into something usable, but it doesn’t actually use the files in the trash. My approach
is to let you determine the MTTR and then schedule your backups accordingly; a backup in case
you are not able to recover your database using the techniques in the current documentation.




From: James Hughes [] 
Sent: Monday, August 17, 2015 4:28 PM
Subject: Re: Accumulo GC and Hadoop trash settings


Ok, I can the see the benefit of being able to recovery data.  Is this process documented?
 And is there any kind of user-friendly tool for it?


On Mon, Aug 17, 2015 at 4:11 PM, <> wrote:


 It's not temporary files, it's any file that has been compacted away. If you keep files around
longer than {dfs.namenode.checkpoint.period}, then you have a chance to recover in case your
most recent checkpoint is corrupt.



From: "James Hughes" <>
Sent: Monday, August 17, 2015 3:57:57 PM
Subject: Accumulo GC and Hadoop trash settings



Hi all,


>From reading about the Accumulo GC, it sounds like temporary files are routinely deleted
during GC cycles.  In a small testing environment, I've the HDFS Accumulo user's .Trash folder
have 10s of gigabytes of data.

Is there any reason that the default value for gc.trash.ignore is false?  Is there any downside
to deleting GC'ed files completely?


Thanks in advance,





View raw message