hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?
Date Wed, 11 Jul 2012 14:09:57 GMT
Thank you guys for the pointers/info! I'll try to make use of it. If it
turns out into smth (like script, etc.) re-usable I will open a JIRA issue
and add it for others to use.

Thanx again,
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase

On Wed, Jul 11, 2012 at 8:51 AM, Stack <stack@duboce.net> wrote:

> On Mon, Jul 9, 2012 at 10:05 PM, Alex Baranau <alex.baranov.v@gmail.com>
> wrote:
> > I fear that complexity with removing HFiles can be caused by (block)
> cache
> > that may hold its information. Is that right? I'm actually OK with HBase
> to
> > return me the data of files I "deleted" by removing HFiles: I will
> specify
> > timerange on scans anyways (in this example to omit things older than 1
> > week).
> >
>
> I think this is a use case we should support natively.  Someone around
> the corner from us was looking to do this.  They load a complete
> dataset each night and on the weekends they want to just drop the old
> stuff by removing the hfiles > N days.
>
> You could script it now.  Look at the hfiles in hdfs -- they have
> sufficient metadata IIRC -- and then do the prescription Jon suggests
> above of close, remove, and reopen.  We could add an API to do this;
> i.e. reread hdfs for hfiles (would be nice to do it 'atomically'
> telling the new API which to drop).
>
> You bring up block cache.  That should be fine.  We shouldn't be
> reading blocks for files that are no longer open.  Old blocks should
> get aged out.
>
> On compaction dropping complete hfiles if they are outside TTL, I'm
> not sure we have that (didn't look too closely).
>
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message