hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: What's the basic idea of pseudo-distributed Hadoop ?
Date Fri, 14 Sep 2012 07:24:41 GMT
Hi Jason,

I think you're confusing the standalone mode with a pseudo-distributed
mode. The former is a limited mode of MR where no daemons need to be
deployed and the tasks run in a single JVM (via threads).

A pseudo distributed cluster is a cluster where all daemons are
running on one node itself. Hence, not "distributed" in the sense of
multi-nodes (no use of an network gear) but works in the same way
between nodes (RPC, etc.) as a fully-distributed one.

If an MR program works fine in a pseudo-distributed mode, it "should"
work (no guarantee) fine in a fully-distributed mode iff all nodes
have the same arch/OS, same JVM, and job-specific configurations. This
is because tasks execute on various nodes and may be affected by the
node's behavior or setup that is different from others - and thats
something you'd have to detect/know about if it exhibits failures more
than others.

On Fri, Sep 14, 2012 at 11:58 AM, Jason Yang <lin.yang.jason@gmail.com> wrote:
> Hey, Kai
> Thanks for you reply.
> I was wondering what's difference btw the pseudo-distributed and
> fully-distributed hadoop, except the maximum number of map/reduce.
> And if a MR program works fine in pseudo-distributed cluster, will it work
> exactly fine in the fully-distributed cluster ?
> 2012/9/14 Kai Voigt <k@123.org>
>> e default setting is that a tasktracker can run up to two map and reduce
>> tasks in parallel (mapred.tasktracker.map.tasks.maximum and
>> mapred.tasktracker.reduce.tasks.maximum), so you will actually see some
>> concurrency on your one machine.
> --
> YANG, Lin

Harsh J

View raw message