Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8E9BDD532 for ; Fri, 17 May 2013 17:17:33 +0000 (UTC) Received: (qmail 59115 invoked by uid 500); 17 May 2013 17:17:27 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 58948 invoked by uid 500); 17 May 2013 17:17:27 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 58941 invoked by uid 99); 17 May 2013 17:17:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 May 2013 17:17:27 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of john.lilley@redpoint.net designates 206.225.164.216 as permitted sender) Received: from [206.225.164.216] (HELO hub021-nj-1.exch021.serverdata.net) (206.225.164.216) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 May 2013 17:17:20 +0000 Received: from MBX021-E3-NJ-2.exch021.domain.local ([10.240.4.78]) by HUB021-NJ-1.exch021.domain.local ([10.240.4.30]) with mapi id 14.02.0318.001; Fri, 17 May 2013 10:16:58 -0700 From: John Lilley To: "user@hadoop.apache.org" Subject: RE: Is FileSystem thread-safe? Thread-Topic: Is FileSystem thread-safe? Thread-Index: Ac4t69s0n0Xr8kvHQaCMGEqnIxCRDwAbJCIAAAcGzsAAAcF8AAA5dfqACOipIRAAFiYEgAAOnQ5w Date: Fri, 17 May 2013 17:16:58 +0000 Message-ID: <869970D71E26D7498BDAC4E1CA92226B6589907F@MBX021-E3-NJ-2.exch021.domain.local> References: <869970D71E26D7498BDAC4E1CA92226B552A6EC0@MBX021-E3-NJ-2.exch021.domain.local> <869970D71E26D7498BDAC4E1CA92226B552A71D9@MBX021-E3-NJ-2.exch021.domain.local> <869970D71E26D7498BDAC4E1CA92226B65898D50@MBX021-E3-NJ-2.exch021.domain.local> <613F21FD-0BC3-44B5-B739-024149444367@apache.org> In-Reply-To: <613F21FD-0BC3-44B5-B739-024149444367@apache.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [206.168.224.109] Content-Type: multipart/alternative; boundary="_000_869970D71E26D7498BDAC4E1CA92226B6589907FMBX021E3NJ2exch_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_869970D71E26D7498BDAC4E1CA92226B6589907FMBX021E3NJ2exch_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Vinod, Thanks, I was mostly asking in the context of attempting to unify the outpu= t of multiple tasks. I've seen that in most cases, users opt to output a f= older full of file parts into HDFS and then read them directly or unify the= m later. John From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] Sent: Friday, May 17, 2013 11:14 AM To: user@hadoop.apache.org Subject: Re: Is FileSystem thread-safe? As of today, there is no atomic append, so no, what you say isn't possible.= FWIU, it is one appender at a time - achieved through a lease per file, an= d multiple concurrent leases aren't allowed for any given file. Thanks, +Vinod Kumar Vavilapalli On May 17, 2013, at 6:40 AM, John Lilley wrote: Thanks! Does this also imply that multiple clients may open the same HDFS f= ile for append simultaneously, and expect append requests to be interleaved= ? john From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] Sent: Monday, April 01, 2013 4:18 PM To: user@hadoop.apache.org Subject: Re: Is FileSystem thread-safe? Hi John, DistributedFileSystem is intended to be thread-safe, true to its name. Metadata operations are handled by the NameNode server which synchronizes c= oncurrent client requests via locks (you can look at the FSNameSystem class= ). Some discussion on the thread-safety aspects of HDFS: http://storageconference.org/2010/Papers/MSST/Shvachko.pdf -Arpit On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu > wrote: If you look at DistributedFileSystem source code, you would see that it cal= ls the DFSClient field member for most of the actions. Requests to Namenode are then made through ClientProtocol. An hdfs committer would be able to give you affirmative answer. On Sun, Mar 31, 2013 at 11:27 AM, John Lilley > wrote: From: Ted Yu [mailto:yuzhihong@gmail.com] Subject: Re: Is FileSystem thread-safe? >>FileSystem is an abstract class, what concrete class are you using (Distr= ibutedFileSystem, etc) ? Good point. I am calling FileSystem.get(URI uri, Configuration conf) with = an URI like "hdfs://server:port/..." on a remote server, so I assume it is = creating a DistributedFileSystem. However I am not finding any documentati= on discussing its thread-safety (or lack thereof), perhaps you can point me= to it? Thanks, john --_000_869970D71E26D7498BDAC4E1CA92226B6589907FMBX021E3NJ2exch_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Vinod,<= /p>

Thanks, I was mostly aski= ng in the context of attempting to unify the output of multiple tasks. = ; I’ve seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them lat= er.

John

 <= /p>

 <= /p>

From: Vinod Ku= mar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Friday, May 17, 2013 11:14 AM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?

 

 

As of today, there is no atomic append, so no, what = you say isn't possible. FWIU, it is one appender at a time - achieved throu= gh a lease per file, and multiple concurrent leases aren't allowed for any = given file.

 

Thanks,=

+Vinod Kumar Vavilapa= lli

 

On May 17, 2013, at 6:40 AM, John Lilley wrote:=



Thanks! Does this also im= ply that multiple clients may open the same HDFS file for append simultaneo= usly, and expect append requests to be interleaved?

john

 <= /p>

From: Arpit Agarwal [mailto:aagarwal@horto= nworks.com] 
Sent: Monday, Apri= l 01, 2013 4:18 PM
To: user@hadoop.apache.org
Subject: Re: Is Fi= leSystem thread-safe?

 

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name. 

Metadata operations are handled by the NameNode server which synchronizes c= oncurrent client requests via locks (you can look at the FSNameSystem class= ).

Some discussion on the thread-safety aspects of HDFS:
http= ://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit


On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yuzhihong@gmail.com&g= t; wrote:

If you look at DistributedFileSystem source code, yo= u would see that it calls the DFSClient field member for most of the action= s.

Requests to Namenode are then made through ClientPro= tocol.

 

An hdfs committer would be able to give you affirmat= ive answer.

 

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <john.lilley@red= point.net> wrote:

From: Ted Yu [mailto:yuzhih= ong@gmail.com] 
Subject: Re: Is Fi= leSystem thread-safe?

>>FileSys= tem is an abstract class, what concrete class are you using (DistributedFil= eSystem, etc) ?

Good point.  I am ca= lling FileSystem.get(URI uri, Configuration conf) with an URI like “<= a href=3D"hdfs://server:port/">hdfs://server:port/…” on a r= emote server, so I assume it is creating a DistributedFileSystem.  However = I am not finding any documentation discussing its thread-safety (or lack th= ereof), perhaps you can point me to it?

Thanks, john<= /o:p>

 

 

--_000_869970D71E26D7498BDAC4E1CA92226B6589907FMBX021E3NJ2exch_--