hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?
Date Tue, 10 Jul 2012 12:10:25 GMT
On Mon, Jul 9, 2012 at 1:05 PM, Alex Baranau <alex.baranov.v@gmail.com>wrote:

> Hey, this is closer!
> However, I think I'd want to avoid major compaction. In fact I was thinking
> about avoiding any compactions & splitting.
> ...

So, you are saying that major compaction will look at max/min ts metainfo
> of the HFile and will remove the whole file based on ttl if necessary
> (without going through the file)? Can I tell it not to actually compact
> other HFiles (i.e. leave them as is, otherwise it would be not as easy to
> remove HFiles again in an hour)? I.e. looks like "delete only whole HFiles
> based on TTL" functionality is wat I need here..
> Of the top of my head, I don't know how "smart" the major compaction code
is wrt to ttls.  I'm pretty sure it isn't smart enough to explicitly ignore
specific files.

> I fear that complexity with removing HFiles can be caused by (block) cache
> that may hold its information. Is that right? I'm actually OK with HBase to
> return me the data of files I "deleted" by removing HFiles: I will specify
> timerange on scans anyways (in this example to omit things older than 1
> week).
I'm not sure what the block cache eviction policy is when a single region
is closed, but it sounds like you are ok if stale data remains.

Sounds like you might want to try the close/delete/open advanced approach
on a test cluster to see if it meets your needs.


// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message