Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of darkit@gmail.com designates
 209.85.198.232 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references;
        b=o3RJ9vQ4y6vWuZm3DzqcKVfbd1fBphVDEUqiSASHx1MtLhYegW5LYNQO01Pa7gqxx/0Cd3o31HNZn5hcBa9/iTQHecNE163mz2TtuBnaWHdnOkMZjid37LHycLHR3pyxENT7hqS6eOpSTr8m494bqR34evSCVqHX9du1d9rMN5M=
Message-ID: <c759efd20804290351p58b0c2fbu4ac57df859cc1a9c@mail.gmail.com>
Date: Tue, 29 Apr 2008 13:51:03 +0300
From: "Max Grigoriev" <darkit@gmail.com>
To: hbase-user@hadoop.apache.org
Subject: Re: Is HBase suitable for ...
In-Reply-To: <c759efd20804290329x3f8d9d9cs5a1f9f169b8ce825@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_12685_3471355.1209466263679"
References: <c759efd20804281457p4949cca3r40c284de75fe46d2@mail.gmail.com>
	 <c759efd20804290329x3f8d9d9cs5a1f9f169b8ce825@mail.gmail.com>

------=_Part_12685_3471355.1209466263679
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Replies and questions inline.


>
> On Apr 28, 2008, at 2:57 PM, Max Grigoriev wrote:
>
>
> What kind of search on different table attributes do you want to do?
> There are no general purpose secondary indexes in HBase, so you
> either have to do a full- or partial-table scan or put the search
> attribute in the primary key.
>

The system is the core of different social networks so it should be
able to make search on every attribute.
Because during core development you don't know all entities and all search
queries. So I think to use hibernate
mapping (no relations - many-to-one and etc... just single attributes) where
user can describe
entity and if this entity is index. And in this case system will
create secondary index.
As HBase doesn't support secondary indexes , I think I'll be able to emulate
them just creating thme by hands secondary index -> primary index as it's
done in Berkeley DB for example.


As far as failover, at the moment, HBase has good recovery for region
> servers, and no recovery for the master. That's something we're
> hoping to change in the future.
>

Is that future near or far ? Can I create new master in case of initial
master failure?  Can master have slaves?

> Can you tell me is HBase will work for such system?
> I think HBase can do what you need, but it'd be nice to have more
> details about what exactly you're going to do with it.
>
i don't know :) because aplication developer will decide what entities and
what they do. What I have to do is to create enviroment for easy creation of
applications.


> If we have 2 or 3 data centers and we loose connection between them
> > - what
> > behavior of HBase will we see ?
> Is your intent to run a single HBase instance across several data
> centers?
>

Yes, because you don't know which datacenter can be down.

At the moment, if a regionserver is cut off from the master,
> it will kill itself. This means that if you have your master at one
> location and regionservers at another, and you lose connectivity,
> your regionservers at the other locations will shut themselves down.
> There are solutions to this we've discussed in the past. However, I
> wonder if maybe the correct solution is not to partition across data
> centers. It's not something that we've discussed at great length yet,
> so there might be an easier way to do it than I'm thinking.
>

If one datacenter goes down and it holds unique data  then you can't
continue to work. It's bad. So it's better to have data in both datacenter
and if one of them is dead, you can continue to work.


> And when we restore connection in 1-2 hours - what should we expect
> > from
> > HBase ?
> This is where things would get sticky - how do you resolve conflicts
> in how data is being served, or worse, how it was split into regions?
> It seems inherently complicated and unpleasant.
>
>
> You can update all records of restored node by update timestamp.

------=_Part_12685_3471355.1209466263679--