Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jamalshasha@gmail.com
 designates 209.85.160.46 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACb0Fn7JCrLoYWNFHyAkYjfmNcmh9zgmQcHxzdNJUZyEcEACEw@mail.gmail.com>
References: 
 <CACb0Fn7kuLROZD6SUPd9e68rLJg9_ydKsRPHkk352hJjy70HtA@mail.gmail.com>
	<CAOcnVr3mc_eHyUjXtpwLwAxCQVwmSnfq-L9XRBEryYzJY7J_kA@mail.gmail.com>
	<CACb0Fn7-jogEE7iHrej91LQ3Tm7t+4yOEEQPt3VfdCHD1v9Z+g@mail.gmail.com>
	<CAPQV63X+RRRcwK=sc547jQ_baqpp4+z=z8PX5=_ViG8K9wk-pQ@mail.gmail.com>
	<CAOcnVr2-m+n5912XOkT3HMki+QYzq3+hSBh11+yygiDnPw5Psw@mail.gmail.com>
	<CAPQV63Vq1gkWTPXYn8kpGiXFfXLac0R4xeLRs7-KwWMSEBMCvg@mail.gmail.com>
	<CAOcnVr0f2z6MrPLZEmUfwFW52BA2tsO26bDFrLeDSe0pFhH3Tg@mail.gmail.com>
	<CACb0Fn7JCrLoYWNFHyAkYjfmNcmh9zgmQcHxzdNJUZyEcEACEw@mail.gmail.com>
Date: Fri, 1 Mar 2013 15:27:10 -0800
Message-ID: 
 <CACb0Fn7c=qvY0V+3078tDz0D_v83nfLLq8zUd1oY+y42VrKQwQ@mail.gmail.com>
Subject: Re: copy chunk of hadoop output
From: jamal sasha <jamalshasha@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=047d7b15b061b5c97104d6e558cb

--047d7b15b061b5c97104d6e558cb
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <jamalshasha@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <harsh@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <jean-marc@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <harsh@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goa=
l
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the targ=
et
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870=
,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> =E2=9E=9C  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op belo=
w
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870=
,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <jean-marc@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <jamalshasha@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <harsh@cloudera.com> wrote=
:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.co=
m
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

--047d7b15b061b5c97104d6e558cb
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Though it copies.. but it gives this error?</div><div clas=
s=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Fri, Mar 1, 2013 at=
 3:21 PM, jamal sasha <span dir=3D"ltr">&lt;<a href=3D"mailto:jamalshasha@g=
mail.com" target=3D"_blank">jamalshasha@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">When I try this.. I get an =
error=C2=A0<div><div>cat: Unable to write to output stream.</div></div><div=
><br></div>
<div>Are these permissions issue</div><div>How do i resolve this?</div><div=
>THanks</div>
</div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><b=
r><br><div class=3D"gmail_quote">On Wed, Feb 20, 2013 at 12:21 PM, Harsh J =
<span dir=3D"ltr">&lt;<a href=3D"mailto:harsh@cloudera.com" target=3D"_blan=
k">harsh@cloudera.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">No problem JM, I was confused as well.<br>
<br>
AFAIK, there&#39;s no shell utility that can let you specify an offset #<br=
>
of bytes to start off with (similar to skip in dd?), but that can be<br>
done from the FS API.<br>
<div><div><br>
On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari<br>
&lt;<a href=3D"mailto:jean-marc@spaggiari.org" target=3D"_blank">jean-marc@=
spaggiari.org</a>&gt; wrote:<br>
&gt; Hi Harsh,<br>
&gt;<br>
&gt; My bad.<br>
&gt;<br>
&gt; I read the example quickly and I don&#39;t know why I tought you used =
tail<br>
&gt; and not head.<br>
&gt;<br>
&gt; head will work perfectly. But tail will not since it will need to read=
<br>
&gt; the entier file. My comment was for tail, not for head, and therefore<=
br>
&gt; not application to the example you gave.<br>
&gt;<br>
&gt;<br>
&gt; hadoop fs -cat 100-byte-dfs-file | tail -c 5 &gt; 5-byte-local-file<br=
>
&gt;<br>
&gt; Will have to download the entire file.<br>
&gt;<br>
&gt; Is there a way to &quot;jump&quot; into a certain position in a file a=
nd &quot;cat&quot; from there?<br>
&gt;<br>
&gt; JM<br>
&gt;<br>
&gt; 2013/2/20, Harsh J &lt;<a href=3D"mailto:harsh@cloudera.com" target=3D=
"_blank">harsh@cloudera.com</a>&gt;:<br>
&gt;&gt; Hi JM,<br>
&gt;&gt;<br>
&gt;&gt; I am not sure how &quot;dangerous&quot; it is, since we&#39;re usi=
ng a pipe here,<br>
&gt;&gt; and as you yourself note, it will only last as long as the last by=
tes<br>
&gt;&gt; have been got and then terminate.<br>
&gt;&gt;<br>
&gt;&gt; The -cat process will terminate because the<br>
&gt;&gt; process we&#39;re piping to will terminate first after it reaches =
its goal<br>
&gt;&gt; of -c &lt;N bytes&gt;; so certainly the &quot;-cat&quot; program w=
ill not fetch the<br>
&gt;&gt; whole file down but it may fetch a few bytes extra over communicat=
ion<br>
&gt;&gt; due to use of read buffers (the extra data won&#39;t be put into t=
he target<br>
&gt;&gt; file, and get discarded).<br>
&gt;&gt;<br>
&gt;&gt; We can try it out and observe the &quot;clienttrace&quot; logged<b=
r>
&gt;&gt; at the DN at the end of the -cat&#39;s read. Here&#39;s an example=
:<br>
&gt;&gt;<br>
&gt;&gt; I wrote a 1.6~ MB file into a file called &quot;foo.jar&quot;, see=
 &quot;bytes&quot;<br>
&gt;&gt; below, its ~1.58 MB:<br>
&gt;&gt;<br>
&gt;&gt; 2013-02-20 23:55:19,777 INFO<br>
&gt;&gt; org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:<=
br>
&gt;&gt; /<a href=3D"http://127.0.0.1:58785" target=3D"_blank">127.0.0.1:58=
785</a>, dest: /<a href=3D"http://127.0.0.1:50010" target=3D"_blank">127.0.=
0.1:50010</a>, bytes: 1658314, op:<br>
&gt;&gt; HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,<=
br>
&gt;&gt; srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:<br>
&gt;&gt; BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73=
870,<br>
&gt;&gt; duration: 192289000<br>
&gt;&gt;<br>
&gt;&gt; I ran the command &quot;hadoop fs -cat foo.jar | head -c 5 &gt; fo=
o.xml&quot; to<br>
&gt;&gt; store first 5 bytes onto a local file:<br>
&gt;&gt;<br>
&gt;&gt; Asserting that post command we get 5 bytes:<br>
&gt;&gt; =E2=9E=9C =C2=A0~ wc -c foo.xml<br>
&gt;&gt; =C2=A0 =C2=A0 =C2=A0 =C2=A05 foo.xml<br>
&gt;&gt;<br>
&gt;&gt; Asserting that DN didn&#39;t IO-read the whole file, see the read =
op below<br>
&gt;&gt; and its &quot;bytes&quot; parameter, its only about 193 KB, not th=
e whole block<br>
&gt;&gt; of 1.58 MB we wrote earlier:<br>
&gt;&gt;<br>
&gt;&gt; 2013-02-21 00:01:32,437 INFO<br>
&gt;&gt; org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:<=
br>
&gt;&gt; /<a href=3D"http://127.0.0.1:50010" target=3D"_blank">127.0.0.1:50=
010</a>, dest: /<a href=3D"http://127.0.0.1:58802" target=3D"_blank">127.0.=
0.1:58802</a>, bytes: 198144, op:<br>
&gt;&gt; HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,=
<br>
&gt;&gt; srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:<br>
&gt;&gt; BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73=
870,<br>
&gt;&gt; duration: 19207000<br>
&gt;&gt;<br>
&gt;&gt; I don&#39;t see how this is anymore dangerous than doing a<br>
&gt;&gt; -copyToLocal/-get, which retrieves the whole file anyway?<br>
&gt;&gt;<br>
&gt;&gt; On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari<br>
&gt;&gt; &lt;<a href=3D"mailto:jean-marc@spaggiari.org" target=3D"_blank">j=
ean-marc@spaggiari.org</a>&gt; wrote:<br>
&gt;&gt;&gt; But be careful.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; hadoop fs -cat will retrieve the entire file and last only whe=
n it<br>
&gt;&gt;&gt; will have retrieve the last bytes you are looking for.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; If your file is many GB big, it will take a lot of time for th=
is<br>
&gt;&gt;&gt; command to complete and will put some pressure on your network=
.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; JM<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; 2013/2/19, jamal sasha &lt;<a href=3D"mailto:jamalshasha@gmail=
.com" target=3D"_blank">jamalshasha@gmail.com</a>&gt;:<br>
&gt;&gt;&gt;&gt; Awesome thanks :)<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; On Tue, Feb 19, 2013 at 2:14 PM, Harsh J &lt;<a href=3D"ma=
ilto:harsh@cloudera.com" target=3D"_blank">harsh@cloudera.com</a>&gt; wrote=
:<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; You can instead use &#39;fs -cat&#39; and the &#39;hea=
d&#39; coreutil, as one example:<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; hadoop fs -cat 100-byte-dfs-file | head -c 5 &gt; 5-by=
te-local-file<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha &lt;<a hr=
ef=3D"mailto:jamalshasha@gmail.com" target=3D"_blank">jamalshasha@gmail.com=
</a>&gt;<br>
&gt;&gt;&gt;&gt;&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt; &gt; Hi,<br>
&gt;&gt;&gt;&gt;&gt; &gt; =C2=A0 I was wondering in the following command:<=
br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; bin/hadoop dfs -copyToLocal hdfspath localpath<br=
>
&gt;&gt;&gt;&gt;&gt; &gt; can we have specify to copy not full but like xMB=
&#39;s of file to local<br>
&gt;&gt;&gt;&gt;&gt; drive?<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; Is something like this possible<br>
&gt;&gt;&gt;&gt;&gt; &gt; Thanks<br>
&gt;&gt;&gt;&gt;&gt; &gt; Jamal<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; --<br>
&gt;&gt;&gt;&gt;&gt; Harsh J<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Harsh J<br>
&gt;&gt;<br>
<br>
<br>
<br>
</div></div>--<br>
Harsh J<br>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--047d7b15b061b5c97104d6e558cb--