Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D711BDD69 for ; Wed, 20 Feb 2013 19:45:26 +0000 (UTC) Received: (qmail 15568 invoked by uid 500); 20 Feb 2013 19:45:21 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 15477 invoked by uid 500); 20 Feb 2013 19:45:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15469 invoked by uid 99); 20 Feb 2013 19:45:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Feb 2013 19:45:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.48] (HELO mail-wg0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Feb 2013 19:45:17 +0000 Received: by mail-wg0-f48.google.com with SMTP id 16so6632815wgi.3 for ; Wed, 20 Feb 2013 11:44:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding :x-gm-message-state; bh=gWhmbILZ48fcHPXUpabGOjBTbOWeSlK/uXZHoF9rJtQ=; b=GUhfykUhaNFMpxC4WCCeko8omFUnueoMpFIZ6qXZVKH4u9SkWdQXUXhjrgPERIiCDC 1tcPi8+YxR2pwJqR1AuAMsYwgMSxvCNnhyEauOwyYDgY05sajaV1pqFosOErNCkzoi9q C9XXT0lTKHSDTj3hqe7hvaDTXshdXvK9pEVS8nlsxuz5oWYX72CiTErOBfFOAhLPiPaI CjxvSPvpZw1xZVV2B/Ymrpw64qJWKRN+lgaDr+2HmM4DBazABm2cJg7uLQvn0yzoVgZk 5QHou8F6utGwFAn15eaph3Bo+BuG7Dfctu3RnAVw2/dCEw4IrCtYjW9oVe9UKbqPaL3U 3s5w== MIME-Version: 1.0 X-Received: by 10.194.7.196 with SMTP id l4mr5185230wja.28.1361389495324; Wed, 20 Feb 2013 11:44:55 -0800 (PST) Received: by 10.194.9.165 with HTTP; Wed, 20 Feb 2013 11:44:55 -0800 (PST) In-Reply-To: References: Date: Wed, 20 Feb 2013 14:44:55 -0500 Message-ID: Subject: Re: copy chunk of hadoop output From: Jean-Marc Spaggiari To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQngUazqMRkLicDLJY35XeQxVTG4uauBPs02FOwWykYYraFSItEt0jFa1EACmLgkZ3cgMTxi X-Virus-Checked: Checked by ClamAV on apache.org Hi Harsh, My bad. I read the example quickly and I don't know why I tought you used tail and not head. head will work perfectly. But tail will not since it will need to read the entier file. My comment was for tail, not for head, and therefore not application to the example you gave. hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file Will have to download the entire file. Is there a way to "jump" into a certain position in a file and "cat" from t= here? JM 2013/2/20, Harsh J : > Hi JM, > > I am not sure how "dangerous" it is, since we're using a pipe here, > and as you yourself note, it will only last as long as the last bytes > have been got and then terminate. > > The -cat process will terminate because the > process we're piping to will terminate first after it reaches its goal > of -c ; so certainly the "-cat" program will not fetch the > whole file down but it may fetch a few bytes extra over communication > due to use of read buffers (the extra data won't be put into the target > file, and get discarded). > > We can try it out and observe the "clienttrace" logged > at the DN at the end of the -cat's read. Here's an example: > > I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes" > below, its ~1.58 MB: > > 2013-02-20 23:55:19,777 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op: > HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0, > srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid: > BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870, > duration: 192289000 > > I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to > store first 5 bytes onto a local file: > > Asserting that post command we get 5 bytes: > =E2=9E=9C ~ wc -c foo.xml > 5 foo.xml > > Asserting that DN didn't IO-read the whole file, see the read op below > and its "bytes" parameter, its only about 193 KB, not the whole block > of 1.58 MB we wrote earlier: > > 2013-02-21 00:01:32,437 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op: > HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0, > srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid: > BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870, > duration: 19207000 > > I don't see how this is anymore dangerous than doing a > -copyToLocal/-get, which retrieves the whole file anyway? > > On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari > wrote: >> But be careful. >> >> hadoop fs -cat will retrieve the entire file and last only when it >> will have retrieve the last bytes you are looking for. >> >> If your file is many GB big, it will take a lot of time for this >> command to complete and will put some pressure on your network. >> >> JM >> >> 2013/2/19, jamal sasha : >>> Awesome thanks :) >>> >>> >>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J wrote: >>> >>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example: >>>> >>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file >>>> >>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha >>>> wrote: >>>> > Hi, >>>> > I was wondering in the following command: >>>> > >>>> > bin/hadoop dfs -copyToLocal hdfspath localpath >>>> > can we have specify to copy not full but like xMB's of file to local >>>> drive? >>>> > >>>> > Is something like this possible >>>> > Thanks >>>> > Jamal >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >>> > > > > -- > Harsh J >