hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Yang <lin.yang.ja...@gmail.com>
Subject Re: What's the basic idea of pseudo-distributed Hadoop ?
Date Fri, 14 Sep 2012 07:34:00 GMT
All right, I got it.

Thanks for all of you.

2012/9/14 Bertrand Dechoux <dechouxb@gmail.com>

> The only difference between pseudo-distributed and fully distributed would
> be scale. You could say that code that runs fine on the former, runs fine
> too on the latter. But it does not necessary mean that the performance will
> scale the same way (ie if you keep a list of elements in memory, at bigger
> scale you could receive OOME).
>
> Of course, like it has been implied in previous answers, you can't say the
> same with standalone. With this mode, you could use a global mutable static
> state thinking it's fine without caring about distribution between the
> nodes. In that case, the same code launched on pseudo-distributed will fail
> to replicate the same results.
>
> Regards
>
> Bertrand
>
>
> On Fri, Sep 14, 2012 at 9:24 AM, Harsh J <harsh@cloudera.com> wrote:
>
>> Hi Jason,
>>
>> I think you're confusing the standalone mode with a pseudo-distributed
>> mode. The former is a limited mode of MR where no daemons need to be
>> deployed and the tasks run in a single JVM (via threads).
>>
>> A pseudo distributed cluster is a cluster where all daemons are
>> running on one node itself. Hence, not "distributed" in the sense of
>> multi-nodes (no use of an network gear) but works in the same way
>> between nodes (RPC, etc.) as a fully-distributed one.
>>
>> If an MR program works fine in a pseudo-distributed mode, it "should"
>> work (no guarantee) fine in a fully-distributed mode iff all nodes
>> have the same arch/OS, same JVM, and job-specific configurations. This
>> is because tasks execute on various nodes and may be affected by the
>> node's behavior or setup that is different from others - and thats
>> something you'd have to detect/know about if it exhibits failures more
>> than others.
>>
>> On Fri, Sep 14, 2012 at 11:58 AM, Jason Yang <lin.yang.jason@gmail.com>
>> wrote:
>> > Hey, Kai
>> >
>> > Thanks for you reply.
>> >
>> > I was wondering what's difference btw the pseudo-distributed and
>> > fully-distributed hadoop, except the maximum number of map/reduce.
>> >
>> > And if a MR program works fine in pseudo-distributed cluster, will it
>> work
>> > exactly fine in the fully-distributed cluster ?
>> >
>> >
>> > 2012/9/14 Kai Voigt <k@123.org>
>> >>
>> >> e default setting is that a tasktracker can run up to two map and
>> reduce
>> >> tasks in parallel (mapred.tasktracker.map.tasks.maximum and
>> >> mapred.tasktracker.reduce.tasks.maximum), so you will actually see some
>> >> concurrency on your one machine.
>> >
>> >
>> >
>> >
>> > --
>> > YANG, Lin
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> Bertrand Dechoux
>



-- 
YANG, Lin

Mime
View raw message