Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 255D510F96 for ; Wed, 7 Aug 2013 18:31:42 +0000 (UTC) Received: (qmail 23160 invoked by uid 500); 7 Aug 2013 18:31:41 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 23083 invoked by uid 500); 7 Aug 2013 18:31:40 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 23075 invoked by uid 99); 7 Aug 2013 18:31:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Aug 2013 18:31:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of andrew.wang@cloudera.com designates 209.85.128.174 as permitted sender) Received: from [209.85.128.174] (HELO mail-ve0-f174.google.com) (209.85.128.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Aug 2013 18:31:34 +0000 Received: by mail-ve0-f174.google.com with SMTP id d10so2239291vea.33 for ; Wed, 07 Aug 2013 11:31:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=1Y5UhUEuZCvuKNtGSx7xZqc7LAbyMWwXisTvuVhVpW4=; b=mT/dT8XaBhPBoRdOUAgf1tHnkVUKXtf9lGV16vuLd/FFN2btP5iGZ1M4NgYmicg4kn xSGdKJxZy3bVcQIn0NLVdzlhlznh3oPp4m9nEu+lx+xkxW8d08p+RG6faRJxCrBxCMB2 JiUhnLqvobJNsZ5MPZJP/CNywYfcD386wJKBLwS1zlKwsE4ubVX7U9NGUnFx6n3Q5ARC qcT39tYTsmjbRNUzE2vuLuT+AjxLBHFsutOZ//YmTz9I5co7bHSNzGOgQ9TpuiqIOwkA brz0ewe5uCBqDRUkX9oGgsJPumfgEh1ETyEuLJIUcrVCuqd4nCopDD4xmFPBaLoonU60 +Evg== X-Gm-Message-State: ALoCoQns7EgQjUzazefyB87pWQQ+cdjKcv9FbtsOF90PU8dMnMvdLgZLM+FtFKLvxaF1+9dVWgbR X-Received: by 10.58.251.144 with SMTP id zk16mr1161367vec.37.1375900273993; Wed, 07 Aug 2013 11:31:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.101.231 with HTTP; Wed, 7 Aug 2013 11:30:53 -0700 (PDT) In-Reply-To: <0A39FA8F621D5C4A9868B1B14334BA2746AF8C91@LAX-EX-MB1.datadirect.datadirectnet.com> References: <52028B14.2080609@ucsd.edu> <0A39FA8F621D5C4A9868B1B14334BA2746AF8C91@LAX-EX-MB1.datadirect.datadirectnet.com> From: Andrew Wang Date: Wed, 7 Aug 2013 11:30:53 -0700 Message-ID: Subject: Re: Feature request to provide DFSInputStream subclassing mechanism To: "hdfs-dev@hadoop.apache.org" Cc: Matevz Tadel , Alja Content-Type: multipart/alternative; boundary=089e013cbfbe1402ae04e35fbf01 X-Virus-Checked: Checked by ClamAV on apache.org --089e013cbfbe1402ae04e35fbf01 Content-Type: text/plain; charset=ISO-8859-1 I don't think exposing DFSClient and DistributedFileSystem members is necessary to achieve what you're trying to do. We've got wrapper FileSystems like FilterFileSystem and ViewFileSystem which you might be able to use for inspiration, and the HCFS wiki lists some third-party FileSystems that might also be helpful too. On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour wrote: > Hello Jeff > > Is it something that could go under HCFS project? > http://wiki.apache.org/hadoop/HCFS > (I might be wrong?) > > Joe > > > On 8/7/13 10:59 AM, "Jeff Dost" wrote: > > >Hello, > > > >We work in a software development team at the UCSD CMS Tier2 Center. We > >would like to propose a mechanism to allow one to subclass the > >DFSInputStream in a clean way from an external package. First I'd like > >to give some motivation on why and then will proceed with the details. > > > >We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment > >at CERN. There are other T2 centers worldwide that contain mirrors of > >the same data we host. We are working on an extension to Hadoop that, > >on reading a file, if it is found that there are no available replicas > >of a block, we use an external interface to retrieve this block of the > >file from another data center. The external interface is necessary > >because not all T2 centers involved in CMS are running a Hadoop cluster > >as their storage backend. > > > >In order to implement this functionality, we need to subclass the > >DFSInputStream and override the read method, so we can catch > >IOExceptions that occur on client reads at the block level. > > > >The basic steps required: > >1. Invent a new URI scheme for the customized "FileSystem" in > >core-site.xml: > > > > fs.foofs.impl > > my.package.FooFileSystem > > My Extended FileSystem for foofs: uris. > > > > > >2. Write new classes included in the external package that subclass the > >following: > >FooFileSystem subclasses DistributedFileSystem > >FooFSClient subclasses DFSClient > >FooFSInputStream subclasses DFSInputStream > > > >Now any client commands that explicitly use the foofs:// scheme in paths > >to access the hadoop cluster can open files with a customized > >InputStream that extends functionality of the default hadoop client > >DFSInputStream. In order to make this happen for our use case, we had > >to change some access modifiers in the DistributedFileSystem, DFSClient, > >and DFSInputStream classes provided by Hadoop. In addition, we had to > >comment out the check in the namenode code that only allows for URI > >schemes of the form "hdfs://". > > > >Attached is a patch file we apply to hadoop. Note that we derived this > >patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be > >found at: > >http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz > > > >We would greatly appreciate any advise on whether or not this approach > >sounds reasonable, and if you would consider accepting these > >modifications into the official Hadoop code base. > > > >Thank you, > >Jeff, Alja & Matevz > >UCSD Physics > > --089e013cbfbe1402ae04e35fbf01--