hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh Radhakrishnan <rake...@apache.org>
Subject Re: Journal nodes in HA
Date Fri, 12 Aug 2016 14:30:25 GMT
Hi Konstantinos,

Nice documentation! Wish you all the success for expanding to Hadoop-HA
mode.

I'd say, the JournalNode should be co-located on machines with other Hadoop
master daemons; for example Namenodes, YARN ResourceManager etc. These
daemons are attractive because they are already well-provisioned machines
with little unpredictable user activity, and those daemons are generally
light on disk usage, compares to worker nodes(Datanode, Nodemanager etc.).
In general, dedicating a disk drive on each of the machines for use by the
JournalNode helps avoid disk spindle competition between others. Sorry, I
don't have any reports with me now. Perhaps other folks can pitch in and
add more about any performance benchmarks results, if any. For ZooKeeper
server, can refer http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html,
https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview pages.

Thanks,
Rakesh

On Fri, Aug 12, 2016 at 5:56 PM, Konstantinos Tsakalozos <
kos.tsakalozos@canonical.com> wrote:

> + the hadoop list
>
> On Fri, Aug 12, 2016 at 3:25 PM, Konstantinos Tsakalozos <
> kos.tsakalozos@canonical.com> wrote:
>
>> Hi Rakesh,
>>
>> Thank you for your prompt reply.
>>
>> In the Juju big data team we bundle Hadoop and a set of "peripheral"
>> helper services so that any interested user can easily deploy the full
>> environment in an automated way.
>> The deployment bundle looks like this: https://jujucharms.com/hadoop-processing/
>> . On the right side of the bundle you see a client service that can be
>> replaced with any other service the user wishes (eg Hive, Pig etc). We
>> also decided to go with ganglia and rsyslog for monitoring. Would you
>> prefer to see anything more there? In the next release we will be adding
>> Apache Zookeeper that will give us HA and this is why I am asking where
>> would it be best to place the journal nodes.
>>
>> In our case it would be preferable to "waste" one more "namenode"
>> machine (machine=unit in juju terminology) to place the third journal
>> service by itself. The deployment would be cleaner and easier to reach.
>> Also, appreciate very much your advice on dedicated storage. Are there any
>> performance benchmarks showing what bandwidth we can sustain with shared vs
>> dedicated storage for the journal nodes?
>>
>> Thank you,
>> Konstantinos
>>
>>
>>
>>
>> On Fri, Aug 12, 2016 at 2:26 PM, Rakesh Radhakrishnan <rakeshr@apache.org
>> > wrote:
>>
>>> Hi Konstantinos,
>>>
>>> The typical deployment is, three Journal Nodes(JNs) and can collocate
>>> two of the three JNs on the same machine where Namenodes(2 NNs) are
>>> running. The third one can be deployed to the machine where ZK server is
>>> running(assume ZK cluster has 3 nodes). I'd recommend to have a dedicated
>>> disk for each JN server to use for edit log path as edit logs will be
>>> writing continuously.
>>>
>>> It would be helpful if you could give more details of your Hadoop
>>> cluster size and components including ZK service etc.
>>>
>>> Thanks,
>>> Rakesh
>>>
>>> On Fri, Aug 12, 2016 at 3:12 PM, Konstantinos Tsakalozos <
>>> kos.tsakalozos@canonical.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> In an HA setup do you tend to co-host the journal service with other
>>>> services instead of having them on separate dedicated machines? If so, what
>>>> services do you pack together?
>>>>
>>>> Thank you,
>>>> Konstantinos
>>>>
>>>
>>>
>>
>

Mime
View raw message