hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marton Elek <me...@hortonworks.com>
Subject Re: HDFS on Docker
Date Mon, 20 Mar 2017 15:20:47 GMT
It really depends from your use case.

There are two problems: networking and configuration.

Usually I use docker-compose on my local machine. With docker-compose all of the containers
will share the same network, so you can set a specific hostname for the namendode container
and use that from spark in core-site.xml as the HDFS root path.

For real (multi-host) cluster I use docker host networking. For host network I can use the
data localization feature of yarn easily without any magic. In that case you can use the hostname
of the server where the namenode container has been started.

In both cases: you don’t need to map docker volume outside as spark uses hdfs over the rpc,
but it could help to persist the working data of the nodes. Usually I set the dfs.namenode.name.dir
and dfs.datanode.data.dir and map these directories as volumes in docker.

Marton

ps:

The way how I am using dockerized hadoop is available from here: https://github.com/elek/bigdata-docker
But this is not the easiest way to start as It contains multiple way to start cluster (local
vs. remote) and sometimes I use consul for configuration management.

On Mar 14, 2017, at 6:17 PM, Adamantios Corais <adamantios.corais@gmail.com<mailto:adamantios.corais@gmail.com>>
wrote:


Hi,

I am trying to setup an HDFS cluster for development and testing using the following docker
image: sequenceiq/hadoop-docker:2.7.1

My question is, which volume should I mount on the host machine in order to read and write
from an external app (e.g. Spark)? What will be the HDFS path in that case?

--
// Adamantios Corais

Mime
View raw message