hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ophir Cohen <oph...@gmail.com>
Subject Re: Thoughts about partitioning retention and other stuff...
Date Sat, 14 May 2011 19:25:18 GMT
About the partitionion: I talked about something more automatic.

My use case: I have data that comes from different customers that has
different retention policies and different behaver.
For example: if I have key such as:
*customerA-date-other_parts_of_key*
*customerB-date-other_parts_of_key*
*customerC-date-other_parts_of_key*
*customerD-date-other_parts_of_key*
*
*
I would like to have some kind of option to tell HBase that all the first
part of the key (say start to the '-' sign) *has* to be in a different
regions and that from now on even with new customer the partitoning
will happened automatically rather manually as it right now.
I'm not sure how to should be implemented but this is my use case...
And yes, I can do it manually...

About the regions deletion: exactly what you say: a tool that I provide
region (or even better: provide *start *and *end* key) and it deletes it in
bulk way.
It should do something as follows:

   1. Split region (or more) by the start/end key.
   2. Close this region/regions.
   3. Remove the directories from the HDFS.
   4. Remove those regions from .META.

It sound to me like a useful tool to have.
As you suggested, I'm going to add an issue and maybe even will try
to implement it...

Ophir

On Fri, May 13, 2011 at 10:00 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> I haven't read the whole thread, but I'll try some answers anyway.
>
> J-D
>
> > What do I mean by partitioning? - an option to state where the regions
> are
> > split.
>
> You can already do that, either at creation time or when doing a split
> via the shell or HBA you can tell on which row it should try to split.
>
> >
> > This is a standard capability of databases and can be use for various
> > things:
> >
> >   - Load balancing - I can split overloaded read/write region into two or
> >   more regions.
> >   - Retention - (say data sorted by time) I can delete old regions.
> >
> > Anther feature I think can be useful is region delete.
> > It good especially to delete large amount of data that sorted together
> (e.g.
> > delete old rows if the key has date)
>
> You can already do it in a very expensive way, so I guess you more
> talking about some sort of "bulk" delete where instead of issuing one
> Delete per row you would the whole folder altogether right? And then
> do the required .META. fixup... Doesn't sound too bad, and could be
> part of online merging, please open a jira.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message