hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8480) Embed HDFS into HBase
Date Fri, 03 May 2013 06:32:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648220#comment-13648220
] 

Lars George commented on HBASE-8480:
------------------------------------

[~stack] Yes, that makes sense, we need to define the various steps. I mean, I think for development
it is good to have the local and pseudo modes as before. We would just add another choice,
i.e. spinning up HDFS as well inside the nodes. Now, going from one to two and then three
servers is the tricky part. By default, if you unpack a tarball and spin up HBase, I would
stay in local mode as usual. Same for the pseudo distributed mode, i.e. the user will need
to configure this as needed herself.

But then if wanted we could have a little CLI config wizard, that preps the new fully autonomous
mode of HBase, where everything is controlled by it, but has all the cluster components. Once
you start this, you would either have one server that runs a Master+NameNode process plus
a RegionServer+DataNode process. Or if we had the above Master-less option, we can run one
single process with all inside - these are just semantics I'd say. Though I personally would
love to see the latter eventually to make HBase a single process system to boot with.

Then as you add nodes, you will have to simply join a new RS+DN process on another node and
the config should be amended - but also could stay the same and the dfs.replication factor
set to "automatic", which scales it from 1 to "default" (or a maximum we can specify). In
other words, when you run two nodes, then you are setting the replication to 2, with three
nodes to 3, with four nodes leave it at 3 and so on.

That should also include a flag that says when ramping up or down the replication factor that
it should apply this to all the files in HDFS. That way you can increase and decrease the
nodes as needed.

As for splitting out the HBM+NN to a separate machine, that is really meaning to shut down
the RS+DN process or internal thread on that machine. Adding a SNN or HA NN is then a little
beyond the automated scope. Well, the SNN we will need, so that should be handled since clusters
are meant to be up forever/a long time. But reconfiguring to non-automatic mode is then done
using the CLI wizard or editing the configs, followed by a rolling restart.

bq. I like the idea of bundling the master and regionserver in one binary better; no more
special master treatment... any one can be msster and or a regionserver?

I do believe that is only useful in automatic mode, because on a larger cluster with dedicated
roles, you already have 2-3 master machines running NNs, ZKs and so on. Then adding Master's
there is trivial (since it is all automatically deployed most of the time by admins). That,
methinks, would not really have a tangible advantage. But yes, when things are small, this
is certainly something that we should have. See above.

                
> Embed HDFS into HBase
> ---------------------
>
>                 Key: HBASE-8480
>                 URL: https://issues.apache.org/jira/browse/HBASE-8480
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars George
>
> HBase is often a bit more involved to get going. We already have the option to host ZooKeeper
for very small clusters. We should have the same for HDFS. The idea is that it adjusts replication
based on the number of nodes, i.e. from 1 to 3 (the default), so that you could start with
a single node and grow the cluster from there. Once the cluster reaches a certain size, and
the admin decides to split the components, we should have a why to export the proper configs/settings
so that you can easily start up an external HDFS and/or ZooKeeper, while updating the HBase
config as well to point to the new "locations".
> The goal is to start a fully operational HBase that can grow from single machine to multi
machine clusters with just a single daemon on each machine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message