Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5290EE8F for ; Fri, 1 Mar 2013 23:27:40 +0000 (UTC) Received: (qmail 10767 invoked by uid 500); 1 Mar 2013 23:27:36 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 10681 invoked by uid 500); 1 Mar 2013 23:27:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 10674 invoked by uid 99); 1 Mar 2013 23:27:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Mar 2013 23:27:35 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jamalshasha@gmail.com designates 209.85.160.46 as permitted sender) Received: from [209.85.160.46] (HELO mail-pb0-f46.google.com) (209.85.160.46) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Mar 2013 23:27:31 +0000 Received: by mail-pb0-f46.google.com with SMTP id uo15so2024400pbc.33 for ; Fri, 01 Mar 2013 15:27:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=0tH08XnBqV3r2f0WORMpWqusEYtV3/5foSwp6DHPmuE=; b=df1qbvNlUdvMs5V/xsNXmbLo6Y7bHg4VQgronfkpdjQepcFXAawpv04mANWgJuMBRj xo/tCMxfR4SiOMBjxlDbLndWUjfaxxC8hAUwDNQw60fvUy8PUptKv1IkhwyjIdLQaXow zji77FDb5tZ4U6P3VZ7r0sR1X9p7rXuiwRFLNFvB+eAuJUQcOZn0v7rHN9QkvKbqsMer Z/MHjRNrCiN0BgxlLEwlgC2Ec1jo55QV/8ZcRTyLegdfY9r4kybYVxFexStBS+atSp9P /g5JFbJG5Qa4r1IEvNPkOz3EPtTNMX3VbYFSRsoWh48B/m5ukCrCo9uIDS+GStRs/D1w xVaQ== MIME-Version: 1.0 X-Received: by 10.68.202.3 with SMTP id ke3mr16830965pbc.98.1362180431004; Fri, 01 Mar 2013 15:27:11 -0800 (PST) Received: by 10.70.12.36 with HTTP; Fri, 1 Mar 2013 15:27:10 -0800 (PST) In-Reply-To: References: Date: Fri, 1 Mar 2013 15:27:10 -0800 Message-ID: Subject: Re: copy chunk of hadoop output From: jamal sasha To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b15b061b5c97104d6e558cb X-Virus-Checked: Checked by ClamAV on apache.org --047d7b15b061b5c97104d6e558cb Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Though it copies.. but it gives this error? On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha wrote: > When I try this.. I get an error > cat: Unable to write to output stream. > > Are these permissions issue > How do i resolve this? > THanks > > > On Wed, Feb 20, 2013 at 12:21 PM, Harsh J wrote: > >> No problem JM, I was confused as well. >> >> AFAIK, there's no shell utility that can let you specify an offset # >> of bytes to start off with (similar to skip in dd?), but that can be >> done from the FS API. >> >> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari >> wrote: >> > Hi Harsh, >> > >> > My bad. >> > >> > I read the example quickly and I don't know why I tought you used tail >> > and not head. >> > >> > head will work perfectly. But tail will not since it will need to read >> > the entier file. My comment was for tail, not for head, and therefore >> > not application to the example you gave. >> > >> > >> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file >> > >> > Will have to download the entire file. >> > >> > Is there a way to "jump" into a certain position in a file and "cat" >> from there? >> > >> > JM >> > >> > 2013/2/20, Harsh J : >> >> Hi JM, >> >> >> >> I am not sure how "dangerous" it is, since we're using a pipe here, >> >> and as you yourself note, it will only last as long as the last bytes >> >> have been got and then terminate. >> >> >> >> The -cat process will terminate because the >> >> process we're piping to will terminate first after it reaches its goa= l >> >> of -c ; so certainly the "-cat" program will not fetch the >> >> whole file down but it may fetch a few bytes extra over communication >> >> due to use of read buffers (the extra data won't be put into the targ= et >> >> file, and get discarded). >> >> >> >> We can try it out and observe the "clienttrace" logged >> >> at the DN at the end of the -cat's read. Here's an example: >> >> >> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes" >> >> below, its ~1.58 MB: >> >> >> >> 2013-02-20 23:55:19,777 INFO >> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op: >> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0, >> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid: >> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870= , >> >> duration: 192289000 >> >> >> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to >> >> store first 5 bytes onto a local file: >> >> >> >> Asserting that post command we get 5 bytes: >> >> =E2=9E=9C ~ wc -c foo.xml >> >> 5 foo.xml >> >> >> >> Asserting that DN didn't IO-read the whole file, see the read op belo= w >> >> and its "bytes" parameter, its only about 193 KB, not the whole block >> >> of 1.58 MB we wrote earlier: >> >> >> >> 2013-02-21 00:01:32,437 INFO >> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op: >> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0, >> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid: >> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870= , >> >> duration: 19207000 >> >> >> >> I don't see how this is anymore dangerous than doing a >> >> -copyToLocal/-get, which retrieves the whole file anyway? >> >> >> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari >> >> wrote: >> >>> But be careful. >> >>> >> >>> hadoop fs -cat will retrieve the entire file and last only when it >> >>> will have retrieve the last bytes you are looking for. >> >>> >> >>> If your file is many GB big, it will take a lot of time for this >> >>> command to complete and will put some pressure on your network. >> >>> >> >>> JM >> >>> >> >>> 2013/2/19, jamal sasha : >> >>>> Awesome thanks :) >> >>>> >> >>>> >> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J wrote= : >> >>>> >> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one >> example: >> >>>>> >> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file >> >>>>> >> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha > > >> >>>>> wrote: >> >>>>> > Hi, >> >>>>> > I was wondering in the following command: >> >>>>> > >> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath >> >>>>> > can we have specify to copy not full but like xMB's of file to >> local >> >>>>> drive? >> >>>>> > >> >>>>> > Is something like this possible >> >>>>> > Thanks >> >>>>> > Jamal >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Harsh J >> >>>>> >> >>>> >> >> >> >> >> >> >> >> -- >> >> Harsh J >> >> >> >> >> >> -- >> Harsh J >> > > --047d7b15b061b5c97104d6e558cb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at= 3:21 PM, jamal sasha <jamalshasha@gmail.com> wrote:
When I try this.. I get an = error=C2=A0
cat: Unable to write to output stream.

Are these permissions issue
How do i resolve this?
THanks

On Wed, Feb 20, 2013 at 12:21 PM, Harsh J = <harsh@cloudera.com> wrote:
No problem JM, I was confused as well.

AFAIK, there's no shell utility that can let you specify an offset # of bytes to start off with (similar to skip in dd?), but that can be
done from the FS API.

On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
<jean-marc@= spaggiari.org> wrote:
> Hi Harsh,
>
> My bad.
>
> I read the example quickly and I don't know why I tought you used = tail
> and not head.
>
> head will work perfectly. But tail will not since it will need to read=
> the entier file. My comment was for tail, not for head, and therefore<= br> > not application to the example you gave.
>
>
> hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file >
> Will have to download the entire file.
>
> Is there a way to "jump" into a certain position in a file a= nd "cat" from there?
>
> JM
>
> 2013/2/20, Harsh J <harsh@cloudera.com>:
>> Hi JM,
>>
>> I am not sure how "dangerous" it is, since we're usi= ng a pipe here,
>> and as you yourself note, it will only last as long as the last by= tes
>> have been got and then terminate.
>>
>> The -cat process will terminate because the
>> process we're piping to will terminate first after it reaches = its goal
>> of -c <N bytes>; so certainly the "-cat" program w= ill not fetch the
>> whole file down but it may fetch a few bytes extra over communicat= ion
>> due to use of read buffers (the extra data won't be put into t= he target
>> file, and get discarded).
>>
>> We can try it out and observe the "clienttrace" logged >> at the DN at the end of the -cat's read. Here's an example= :
>>
>> I wrote a 1.6~ MB file into a file called "foo.jar", see= "bytes"
>> below, its ~1.58 MB:
>>
>> 2013-02-20 23:55:19,777 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:<= br> >> /127.0.0.1:58= 785, dest: /127.0.= 0.1:50010, bytes: 1658314, op:
>> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,<= br> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73= 870,
>> duration: 192289000
>>
>> I ran the command "hadoop fs -cat foo.jar | head -c 5 > fo= o.xml" to
>> store first 5 bytes onto a local file:
>>
>> Asserting that post command we get 5 bytes:
>> =E2=9E=9C =C2=A0~ wc -c foo.xml
>> =C2=A0 =C2=A0 =C2=A0 =C2=A05 foo.xml
>>
>> Asserting that DN didn't IO-read the whole file, see the read = op below
>> and its "bytes" parameter, its only about 193 KB, not th= e whole block
>> of 1.58 MB we wrote earlier:
>>
>> 2013-02-21 00:01:32,437 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:<= br> >> /127.0.0.1:50= 010, dest: /127.0.= 0.1:58802, bytes: 198144, op:
>> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,=
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73= 870,
>> duration: 19207000
>>
>> I don't see how this is anymore dangerous than doing a
>> -copyToLocal/-get, which retrieves the whole file anyway?
>>
>> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> <j= ean-marc@spaggiari.org> wrote:
>>> But be careful.
>>>
>>> hadoop fs -cat will retrieve the entire file and last only whe= n it
>>> will have retrieve the last bytes you are looking for.
>>>
>>> If your file is many GB big, it will take a lot of time for th= is
>>> command to complete and will put some pressure on your network= .
>>>
>>> JM
>>>
>>> 2013/2/19, jamal sasha <jamalshasha@gmail.com>:
>>>> Awesome thanks :)
>>>>
>>>>
>>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <harsh@cloudera.com> wrote= :
>>>>
>>>>> You can instead use 'fs -cat' and the 'hea= d' coreutil, as one example:
>>>>>
>>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-by= te-local-file
>>>>>
>>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com= >
>>>>> wrote:
>>>>> > Hi,
>>>>> > =C2=A0 I was wondering in the following command:<= br> >>>>> >
>>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath >>>>> > can we have specify to copy not full but like xMB= 's of file to local
>>>>> drive?
>>>>> >
>>>>> > Is something like this possible
>>>>> > Thanks
>>>>> > Jamal
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>
>>
>>
>> --
>> Harsh J
>>



--
Harsh J


--047d7b15b061b5c97104d6e558cb--