Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 3 May 2013 06:32:20 +0000 (UTC)
From: "Lars George (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12645752.1367479325083.264265.1367562740758@arcas>
In-Reply-To: <JIRA.12645752.1367479325083@arcas>
References: <JIRA.12645752.1367479325083@arcas>
Subject: [jira] [Commented] (HBASE-8480) Embed HDFS into HBase
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648220#comment-13648220 ] 

Lars George commented on HBASE-8480:
------------------------------------

[~stack] Yes, that makes sense, we need to define the various steps. I mean, I think for development it is good to have the local and pseudo modes as before. We would just add another choice, i.e. spinning up HDFS as well inside the nodes. Now, going from one to two and then three servers is the tricky part. By default, if you unpack a tarball and spin up HBase, I would stay in local mode as usual. Same for the pseudo distributed mode, i.e. the user will need to configure this as needed herself.

But then if wanted we could have a little CLI config wizard, that preps the new fully autonomous mode of HBase, where everything is controlled by it, but has all the cluster components. Once you start this, you would either have one server that runs a Master+NameNode process plus a RegionServer+DataNode process. Or if we had the above Master-less option, we can run one single process with all inside - these are just semantics I'd say. Though I personally would love to see the latter eventually to make HBase a single process system to boot with.

Then as you add nodes, you will have to simply join a new RS+DN process on another node and the config should be amended - but also could stay the same and the dfs.replication factor set to "automatic", which scales it from 1 to "default" (or a maximum we can specify). In other words, when you run two nodes, then you are setting the replication to 2, with three nodes to 3, with four nodes leave it at 3 and so on.

That should also include a flag that says when ramping up or down the replication factor that it should apply this to all the files in HDFS. That way you can increase and decrease the nodes as needed.

As for splitting out the HBM+NN to a separate machine, that is really meaning to shut down the RS+DN process or internal thread on that machine. Adding a SNN or HA NN is then a little beyond the automated scope. Well, the SNN we will need, so that should be handled since clusters are meant to be up forever/a long time. But reconfiguring to non-automatic mode is then done using the CLI wizard or editing the configs, followed by a rolling restart.

bq. I like the idea of bundling the master and regionserver in one binary better; no more special master treatment... any one can be msster and or a regionserver?

I do believe that is only useful in automatic mode, because on a larger cluster with dedicated roles, you already have 2-3 master machines running NNs, ZKs and so on. Then adding Master's there is trivial (since it is all automatically deployed most of the time by admins). That, methinks, would not really have a tangible advantage. But yes, when things are small, this is certainly something that we should have. See above.

                
> Embed HDFS into HBase
> ---------------------
>
>                 Key: HBASE-8480
>                 URL: https://issues.apache.org/jira/browse/HBASE-8480
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars George
>
> HBase is often a bit more involved to get going. We already have the option to host ZooKeeper for very small clusters. We should have the same for HDFS. The idea is that it adjusts replication based on the number of nodes, i.e. from 1 to 3 (the default), so that you could start with a single node and grow the cluster from there. Once the cluster reaches a certain size, and the admin decides to split the components, we should have a why to export the proper configs/settings so that you can easily start up an external HDFS and/or ZooKeeper, while updating the HBase config as well to point to the new "locations".
> The goal is to start a fully operational HBase that can grow from single machine to multi machine clusters with just a single daemon on each machine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira