hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?
Date Wed, 11 Jul 2012 12:51:07 GMT
On Mon, Jul 9, 2012 at 10:05 PM, Alex Baranau <alex.baranov.v@gmail.com> wrote:
> I fear that complexity with removing HFiles can be caused by (block) cache
> that may hold its information. Is that right? I'm actually OK with HBase to
> return me the data of files I "deleted" by removing HFiles: I will specify
> timerange on scans anyways (in this example to omit things older than 1
> week).

I think this is a use case we should support natively.  Someone around
the corner from us was looking to do this.  They load a complete
dataset each night and on the weekends they want to just drop the old
stuff by removing the hfiles > N days.

You could script it now.  Look at the hfiles in hdfs -- they have
sufficient metadata IIRC -- and then do the prescription Jon suggests
above of close, remove, and reopen.  We could add an API to do this;
i.e. reread hdfs for hfiles (would be nice to do it 'atomically'
telling the new API which to drop).

You bring up block cache.  That should be fine.  We shouldn't be
reading blocks for files that are no longer open.  Old blocks should
get aged out.

On compaction dropping complete hfiles if they are outside TTL, I'm
not sure we have that (didn't look too closely).


View raw message