hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghunath, Ranjith" <>
Subject Remote vs Local metastore for hive
Date Thu, 08 Mar 2012 04:35:49 GMT
I trying to understand this concept a little better and could use some help from the larger
community. I jotted down a couple of quick notes as I was reading through the material of
local vs remote:

1.       In local mode, each hive client will invoke a connection to the database. If there
are several clients connected to the database this could overwhelm the instance depending
on the max connection parameter set. By default, this value is set at 151 (in MySQL) and can
be bumped up to large value depending on how much ram the box has.

2.       In remote mode, each of the clients go through the metastore service.

The question here is:

1.       Can each node on the cluster have a separate metastore service when using the remote
metastore configuration?

a.       If so managing this seems like a nightmare in terms of keeping the logs in sync.

b.      This seems to be like a single point of failure as all connections are routed through
a metastore service.

2.       What is preferred approach here with respect to local vs remote?

3.       In order to avoid overwhelming the database should the following parameters be tuned:

a.       hive.metastore.server.min.threads

b.      hive.metastore.server.max.threads


View raw message