Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of wanghj966@gmail.com designates
 209.85.212.177 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr1KTwaFd1N42SrRVhpguW8MjWYuLwkkEDmddXhxYv4D9Q@mail.gmail.com>
References: 
 <7B0D51053A50034199FF706B2513104F1124DC31@SACEXCMBX01-PRD.hq.netapp.com>
	<210B0118-B51D-445C-99EE-ECC6D8642D29@wizecommerce.com>
	<7B0D51053A50034199FF706B2513104F1124DC8A@SACEXCMBX01-PRD.hq.netapp.com>
	<CAOcnVr1KTwaFd1N42SrRVhpguW8MjWYuLwkkEDmddXhxYv4D9Q@mail.gmail.com>
Date: Fri, 31 May 2013 18:14:27 +0800
Message-ID: 
 <CAKAYBSha3dfvUtN42GQ_3xtn2XU5PFQdRYkg_82ajMJT+v==FA@mail.gmail.com>
Subject: Re: MapReduce on Local FileSystem
From: =?GB2312?B?zfW66b78?= <wanghj966@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=f46d044402b03f366004de00e11c

--f46d044402b03f366004de00e11c
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Ingesting the data in HDFS is slow  ,Because it need a jvm process. But if
you don't use hdfs, you can't benifit from its features.   Without hdfs,the
big data will not be splited and distributed; I think  the initial time of
jvm is affordable if data is big, and hadoop is not good choice if the data
 is small.
file://   is cited local data, without distribution, other tasktracker
can't cite it until you copy it to the node all tasktrackers reside.


2013/5/31 Harsh J <harsh@cloudera.com>

> Then why not simply run with Write Replication Factor set to 1?
>
> On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
> <Nikhil.Agarwal@netapp.com> wrote:
> > Hi,
> >
> >
> >
> > Thank you for your reply. One simple answer can be to reduce the time
> taken
> > for ingesting the data in HDFS.
> >
> >
> >
> > Regards,
> >
> > Nikhil
> >
> >
> >
> > From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> > Sent: Friday, May 31, 2013 12:50 PM
> > To: <user@hadoop.apache.org>
> > Cc: user@hadoop.apache.org
> >
> >
> > Subject: Re: MapReduce on Local FileSystem
> >
> >
> >
> > Basic question. Why would u want to do that ? Also I think the Map R
> Hadoop
> > distribution has an NFS mountable HDFS
> >
> > Sanjay
> >
> > Sent from my iPhone
> >
> >
> > On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <
> Nikhil.Agarwal@netapp.com>
> > wrote:
> >
> > Hi,
> >
> >
> >
> > Is it possible to run MapReduce on multiple nodes using Local File syst=
em
> > (file:///)  ?
> >
> > I am able to run it in single node setup but in a multiple node setup t=
he
> > =93slave=94 nodes are not able to access the =93jobtoken=94 file which =
is
> present in
> > the Hadoop.tmp.dir in =93master=94 node.
> >
> >
> >
> > Please let me know if it is possible to do this.
> >
> >
> >
> > Thanks & Regards,
> >
> > Nikhil
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > This email message and any attachments are for the exclusive use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. Any unauthorized review, use, disclosure or distribution i=
s
> > prohibited. If you are not the intended recipient, please contact the
> sender
> > by reply email and destroy all copies of the original message along wit=
h
> any
> > attachments, from your computer system. If you are the intended
> recipient,
> > please be advised that the content of this message is subject to access=
,
> > review and disclosure by the sender's Email System Administrator.
>
>
>
> --
> Harsh J
>

--f46d044402b03f366004de00e11c
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><span style=3D"color:rgb(31,73,125);font-family:arial,sans=
-serif;font-size:14px">Ingesting the data in HDFS is slow =A0,Because it ne=
ed a jvm process. But if you don&#39;t use hdfs, you can&#39;t benifit from=
 its features. =A0 Without hdfs,the big data will not be splited and distri=
buted; I think =A0the initial time of jvm is affordable if data is big, and=
 hadoop is not good choice if the data =A0is small.</span><br>
<div style><span style=3D"color:rgb(31,73,125);font-family:arial,sans-serif=
;font-size:14px">file:// =A0 is cited local data, without distribution, oth=
er tasktracker can&#39;t cite it until you copy it to the node all tasktrac=
kers reside.</span></div>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2013/5/=
31 Harsh J <span dir=3D"ltr">&lt;<a href=3D"mailto:harsh@cloudera.com" targ=
et=3D"_blank">harsh@cloudera.com</a>&gt;</span><br><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex">
Then why not simply run with Write Replication Factor set to 1?<br>
<br>
On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil<br>
<div class=3D"HOEnZb"><div class=3D"h5">&lt;<a href=3D"mailto:Nikhil.Agarwa=
l@netapp.com">Nikhil.Agarwal@netapp.com</a>&gt; wrote:<br>
&gt; Hi,<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Thank you for your reply. One simple answer can be to reduce the time =
taken<br>
&gt; for ingesting the data in HDFS.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Regards,<br>
&gt;<br>
&gt; Nikhil<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; From: Sanjay Subramanian [mailto:<a href=3D"mailto:Sanjay.Subramanian@=
wizecommerce.com">Sanjay.Subramanian@wizecommerce.com</a>]<br>
&gt; Sent: Friday, May 31, 2013 12:50 PM<br>
&gt; To: &lt;<a href=3D"mailto:user@hadoop.apache.org">user@hadoop.apache.o=
rg</a>&gt;<br>
&gt; Cc: <a href=3D"mailto:user@hadoop.apache.org">user@hadoop.apache.org</=
a><br>
&gt;<br>
&gt;<br>
&gt; Subject: Re: MapReduce on Local FileSystem<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Basic question. Why would u want to do that ? Also I think the Map R H=
adoop<br>
&gt; distribution has an NFS mountable HDFS<br>
&gt;<br>
&gt; Sanjay<br>
&gt;<br>
&gt; Sent from my iPhone<br>
&gt;<br>
&gt;<br>
&gt; On May 30, 2013, at 11:37 PM, &quot;Agarwal, Nikhil&quot; &lt;<a href=
=3D"mailto:Nikhil.Agarwal@netapp.com">Nikhil.Agarwal@netapp.com</a>&gt;<br>
&gt; wrote:<br>
&gt;<br>
&gt; Hi,<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Is it possible to run MapReduce on multiple nodes using Local File sys=
tem<br>
&gt; (file:///) =A0?<br>
&gt;<br>
&gt; I am able to run it in single node setup but in a multiple node setup =
the<br>
&gt; =93slave=94 nodes are not able to access the =93jobtoken=94 file which=
 is present in<br>
&gt; the Hadoop.tmp.dir in =93master=94 node.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Please let me know if it is possible to do this.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Thanks &amp; Regards,<br>
&gt;<br>
&gt; Nikhil<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; CONFIDENTIALITY NOTICE<br>
&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
&gt; This email message and any attachments are for the exclusive use of th=
e<br>
&gt; intended recipient(s) and may contain confidential and privileged<br>
&gt; information. Any unauthorized review, use, disclosure or distribution =
is<br>
&gt; prohibited. If you are not the intended recipient, please contact the =
sender<br>
&gt; by reply email and destroy all copies of the original message along wi=
th any<br>
&gt; attachments, from your computer system. If you are the intended recipi=
ent,<br>
&gt; please be advised that the content of this message is subject to acces=
s,<br>
&gt; review and disclosure by the sender&#39;s Email System Administrator.<=
br>
<br>
<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br></div>

--f46d044402b03f366004de00e11c--