Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7ECC91040D for ; Wed, 29 Jan 2014 13:53:02 +0000 (UTC) Received: (qmail 92350 invoked by uid 500); 29 Jan 2014 13:52:54 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 91995 invoked by uid 500); 29 Jan 2014 13:52:53 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 91985 invoked by uid 99); 29 Jan 2014 13:52:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jan 2014 13:52:53 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jayunit100@gmail.com designates 209.85.215.50 as permitted sender) Received: from [209.85.215.50] (HELO mail-la0-f50.google.com) (209.85.215.50) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jan 2014 13:52:49 +0000 Received: by mail-la0-f50.google.com with SMTP id ec20so1477155lab.23 for ; Wed, 29 Jan 2014 05:52:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=QoA1ZltGqoHzwazqofGEGRhPwjTtG2XoDtifVIqbzTE=; b=djtx482XdfKO4wHsYG8h9t8faGkvOA6e0Vh92A19NC4dzveSTr7r7/zY7I9WrxiGEr 7wsuHqN5U/f+EetExCsXRlaZWRG0uvJEWYi2xnkleOshT0Z9NClYhaJEnlo7CyjXZMGk zSYm0oRavafKqUnUK6OxSOVcpCooAfHIfJj2tBoUCYvu8FE9ZAyRdFx1R4Np0+Vxeu6+ JLlYrL6Q5KZWOVONiSU5U+B95ndDexhpSNO7LLZ0iGtlmh8LL7InORyONjVF3/Ozl1n3 YcSDz5pkyNV+nr2gxfX761/2uKAQx7nQ3XS/D4wUWPPg3EJkyY68sBtGeiMFkQuGA+qC 9LXA== MIME-Version: 1.0 X-Received: by 10.152.234.139 with SMTP id ue11mr5617906lac.26.1391003547669; Wed, 29 Jan 2014 05:52:27 -0800 (PST) Received: by 10.112.143.229 with HTTP; Wed, 29 Jan 2014 05:52:27 -0800 (PST) In-Reply-To: References: Date: Wed, 29 Jan 2014 08:52:27 -0500 Message-ID: Subject: Re: performance of "hadoop fs -put" From: Jay Vyas To: "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a1133a85c571a3104f11c40eb X-Virus-Checked: Checked by ClamAV on apache.org --001a1133a85c571a3104f11c40eb Content-Type: text/plain; charset=ISO-8859-1 No , im using a glob pattern, its all done in one "put" statement On Tue, Jan 28, 2014 at 9:22 PM, Harsh J wrote: > Are you calling one command per file? That's bound to be slow as it > invokes a new JVM each time. > On Jan 29, 2014 7:15 AM, "Jay Vyas" wrote: > >> Im finding that "hadoop fs -put" on a cluster is quite slow for me when i >> have large amounts of small files... much slower than native file ops. >> Note that Im using the RawLocalFileSystem as the underlying backing >> filesystem that is being written to in this case, so HDFS isnt the issue. >> >> I see that the Put class creates a linkedlist of # number of elements in >> the path. >> >> 1) Is there a more performant way to run "fs -put" >> >> 2) Has anyone else noted that "fs -put" has extra overhead? >> >> Im going to trace some more but , just wanted to bounce this off the >> mailing list... maybe others also have run into this issue. >> >> ** Is "hadoop fs -put" inherently slower than a unix "cp"action, >> regardless of filesystem -- and if so , why? ** >> >> >> -- >> Jay Vyas >> http://jayunit100.blogspot.com >> > -- Jay Vyas http://jayunit100.blogspot.com --001a1133a85c571a3104f11c40eb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
No , im using a glob pattern, its all done in one "pu= t" statement=A0


On Tue, Jan 28, 2014 at 9:22 PM, Harsh J <= ;harsh@cloudera.com= > wrote:

Are you calling one command p= er file? That's bound to be slow as it invokes a new JVM each time.

=
On Jan 29, 2014 7:15 AM, "Jay Vyas" &l= t;jayunit100@gmai= l.com> wrote:
Im finding that "hadoop fs -put" on a clust= er is quite slow for me when i have large amounts of small files... much sl= ower than native file ops.=A0 Note that Im using the RawLocalFileSystem as = the underlying backing filesystem that is being written to in this case, so= HDFS isnt the issue.

I see that the Put class creates a linkedlist of # num= ber of elements in the path.=A0

1) Is there a more performant way t= o run "fs -put"

2) Has anyone else no= ted that "fs -put" has extra overhead?

Im going to trace some more but , just wanted to bounce this= off the mailing list... maybe others also have run into this issue.=A0
** Is "hadoop fs -put" inherently slower than a unix "c= p"action, regardless of filesystem -- and if so , why? **





--
= Jay Vyas
ht= tp://jayunit100.blogspot.com
--001a1133a85c571a3104f11c40eb--