hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atish Kathpal <atish.kath...@gmail.com>
Subject Re: Running Hadoop v2 clustered mode MR on an NFS mounted filesystem
Date Sun, 22 Dec 2013 09:17:03 GMT
Thanks Devin, Yong, and Chris for your replies and suggestions. I will test
the suggestions made by Yong and Devin and get back to you guys.

As on the bottlenecking issue, I agree, but  I am trying to run few MR jobs
on a traditional NAS server. I can live with a few bottlenecks, so long as
I don't have to move the data to a dedicated HDFS cluster.


On Sat, Dec 21, 2013 at 8:06 AM, Chris Mawata <chris.mawata@gmail.com>wrote:

>  Yong raises an important issue:  You have thrown out the I/O advantages
> of HDFS and also thrown out the advantages of data locality. It would be
> interesting to know why you are taking this approach.
> Chris
>
>
> On 12/20/2013 9:28 AM, java8964 wrote:
>
> I believe the "-fs local" should be removed too. The reason is that even
> you have a dedicated JobTracker after removing "-jt local", but with "-fs
> local", I believe that all the mappers will be run sequentially.
>
>  "-fs local" will force the mapreducer run in "local" mode, which is
> really a test mode.
>
>  What you can do is to remove both "-fs local -jt local", but give the
> FULL URI of the input and output path, to tell Hadoop that they are local
> filesystem instead of HDFS.
>
>  "hadoop jar
> /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
> wordcount file:///hduser/mount_point file:///results"
>
>  Keep in mind followings:
>
>  1) The NFS mount need to be available in all your Task Nodes, and
> mounted in the same way.
> 2) Even you can do that, but your sharing storage will be your bottleneck.
> NFS won't work well for scalability.
>
>  Yong
>
>  ------------------------------
> Date: Fri, 20 Dec 2013 09:01:32 -0500
> Subject: Re: Running Hadoop v2 clustered mode MR on an NFS mounted
> filesystem
> From: dsuiter@rdx.com
> To: user@hadoop.apache.org
>
> I think most of your problem is coming from the options you are setting:
>
>  "hadoop jar
> /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
> wordcount *-fs local -jt local* /hduser/mount_point/  /results"
>
>  You appear to be directing your namenode to run jobs in the *LOCAL* job
> runner and directing it to read from the *LOCAL* filesystem. Drop the
> *-jt* argument and it should run in distributed mode if your cluster is
> set up right. You don't need to do anything special to point Hadoop towards
> a NFS location, other than set up the NFS location properly and make sure
> if you are directing to it by name that it will resolve to the right
> address. Hadoop doesn't care where it is, as long as it can read from and
> write to it. The fact that you are telling it to read/write from/to a NFS
> location that happens to be mounted as a local filesystem object doesn't
> matter - you could direct it to the local /hduser/ path and set the -fs
> local option, and it would end up on the NFS mount, because that's where
> the NFS mount actually exists, or you could direct it to the absolute
> network location of the folder that you want, it shouldn't make a
> difference.
>
>  *Devin Suiter*
> Jr. Data Solutions Software Engineer
>   100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Dec 20, 2013 at 5:27 AM, Atish Kathpal <atish.kathpal@gmail.com>wrote:
>
> Hello
>
>  The picture below describes the deployment architecture I am trying to
> achieve.
> However, when I run the wordcount example code with the below
> configuration, by issuing the command from the master node, I notice only
> the master node spawning map tasks and completing the submitted job. Below
> is the command I used:
>
>  *hadoop jar
> /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
> wordcount -fs local -jt local /hduser/mount_point/  /results*
>
>  *Question: How can I leverage both the hadoop nodes for running MR,
> while serving my data from the common NFS mount point running my filesystem
> at the backend? Has any one tried such a setup before?*
> [image: Inline image 1]
>
>  Thanks!
>
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message