hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Suiter RDX <dsui...@rdx.com>
Subject Re: Running Hadoop v2 clustered mode MR on an NFS mounted filesystem
Date Fri, 20 Dec 2013 14:43:26 GMT
Yes, there will be the issue of bottlenecking too. There are lots of newer
distributed filesystem formats that work well with Hadoop, if you don't
want to do HDFS.

If you are using a traditional filesystem, you aren't getting any parallel
work done - there's only on file to work on, in one piece. With a
Hadoop-friendly distributed filesystem, 1 file is broken into file splits,
many pieces. Then every mapper works on a piece.

Also, you probably shouldn't be running too much work on your master node -
memory is critical for Hadoop, and if your namenode service runs out of
memory, bad things happens.

If you knew these things already, I apologize, but they are pretty key
things to know (and really understand the impact of them) when you are
working in Hadoop.

MongoDB is kind like a smaller-scale cousin to Hadoop that runs MapReduce
jobs too, maybe that would be good for your use case, if you haven't looked
at it?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Fri, Dec 20, 2013 at 9:28 AM, java8964 <java8964@hotmail.com> wrote:

> I believe the "-fs local" should be removed too. The reason is that even
> you have a dedicated JobTracker after removing "-jt local", but with "-fs
> local", I believe that all the mappers will be run sequentially.
>
> "-fs local" will force the mapreducer run in "local" mode, which is really
> a test mode.
>
> What you can do is to remove both "-fs local -jt local", but give the FULL
> URI of the input and output path, to tell Hadoop that they are local
> filesystem instead of HDFS.
>
> "hadoop jar
> /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
> wordcount file:///hduser/mount_point file:///results"
>
> Keep in mind followings:
>
> 1) The NFS mount need to be available in all your Task Nodes, and mounted
> in the same way.
> 2) Even you can do that, but your sharing storage will be your bottleneck.
> NFS won't work well for scalability.
>
> Yong
>
> ------------------------------
> Date: Fri, 20 Dec 2013 09:01:32 -0500
> Subject: Re: Running Hadoop v2 clustered mode MR on an NFS mounted
> filesystem
> From: dsuiter@rdx.com
> To: user@hadoop.apache.org
>
> I think most of your problem is coming from the options you are setting:
>
> "hadoop jar
> /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
> wordcount *-fs local -jt local* /hduser/mount_point/  /results"
>
> You appear to be directing your namenode to run jobs in the *LOCAL* job
> runner and directing it to read from the *LOCAL* filesystem. Drop the
> *-jt* argument and it should run in distributed mode if your cluster is
> set up right. You don't need to do anything special to point Hadoop towards
> a NFS location, other than set up the NFS location properly and make sure
> if you are directing to it by name that it will resolve to the right
> address. Hadoop doesn't care where it is, as long as it can read from and
> write to it. The fact that you are telling it to read/write from/to a NFS
> location that happens to be mounted as a local filesystem object doesn't
> matter - you could direct it to the local /hduser/ path and set the -fs
> local option, and it would end up on the NFS mount, because that's where
> the NFS mount actually exists, or you could direct it to the absolute
> network location of the folder that you want, it shouldn't make a
> difference.
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Dec 20, 2013 at 5:27 AM, Atish Kathpal <atish.kathpal@gmail.com>wrote:
>
> Hello
>
> The picture below describes the deployment architecture I am trying to
> achieve.
> However, when I run the wordcount example code with the below
> configuration, by issuing the command from the master node, I notice only
> the master node spawning map tasks and completing the submitted job. Below
> is the command I used:
>
> *hadoop jar
> /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
> wordcount -fs local -jt local /hduser/mount_point/  /results*
>
> *Question: How can I leverage both the hadoop nodes for running MR, while
> serving my data from the common NFS mount point running my filesystem at
> the backend? Has any one tried such a setup before?*
> [image: Inline image 1]
>
> Thanks!
>
>
>

Mime
View raw message