hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Söztutar <enis....@gmail.com>
Subject Re: Changing it so we do NOT archive hfiles by default
Date Thu, 20 Nov 2014 20:05:42 GMT
The snapshots are relying on this feature right? Will we check whether any
links back to the hfiles ?

The region replicas also depend on the TTL'ed deletion of the hfiles. We do
not explicitly create hfilelinks (on the file system), but we use hfile
links to refer to primary region's files (either in data dir, or in


On Thu, Nov 20, 2014 at 11:08 AM, Stack <stack@duboce.net> wrote:

> I think we should swap the default that has us archive hfiles rather than
> just outright delete them when we are done with them. The current
> configuration works for the minority of us who are running backup tools.
> For the rest of us, our clusters are doing unnecessary extra work.
> Background:
> Since 0.94 (https://issues.apache.org/jira/browse/HBASE-5547), when we are
> done with an hfile, it is moved to the 'archive' (hbase/.archive)
> directory. A thread in the master then removes hfiles older than some
> configured time. We do this rather than just delete hfiles to facilitate
> backup tools -- let backup tools have a say in when an hfile is safe to
> remove.
> The subject on HBASE-5547 has it that the archiving behavior only happens
> when the cluster is in 'backup mode', but as it turns out, later in the
> issue discussion, the implementation becomes significantly easier if we
> just always archive and that is what we ended up implementing and
> committing.
> These last few days, a few of us have been helping a user on a large
> cluster who is (temporarily) doing loads of compactions with the replaced
> hfiles being moved to hbase/.archive. The cleaning thread in master is not
> working fast enough deleting the hfiles so there is buildup going on -- so
> much so, its slowing the whole cluster down (NN operations over tens of
> millions of files).
> Any problem swapping the default and having users opt-in for archiving?
> (I'd leave it as is in released software).  I will also take a look at
> having the cleaner thread do more work per cycle.
> Thanks,
> St.Ack

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message