Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B872FD7DA for ; Thu, 6 Dec 2012 22:16:11 +0000 (UTC) Received: (qmail 51854 invoked by uid 500); 6 Dec 2012 22:16:06 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 51700 invoked by uid 500); 6 Dec 2012 22:16:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 51687 invoked by uid 99); 6 Dec 2012 22:16:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 22:16:06 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.223.169 as permitted sender) Received: from [209.85.223.169] (HELO mail-ie0-f169.google.com) (209.85.223.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 22:16:00 +0000 Received: by mail-ie0-f169.google.com with SMTP id c14so13075191ieb.14 for ; Thu, 06 Dec 2012 14:15:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=ChxexWsMtsQJ+pjA/pVC6f5Sfrvoi0sojSLTKCYYwxY=; b=lykjmwblu9uZAy7EDqwBOVmPpWHrkJwR7bhLeKz+/kDcjqCyhoCGYlCQRvQtIULGe/ 31QWsBs2Y5/d7+6P/5NbLjr7AUWkH/u7uAmErgti3UJjG4gd0YsPwq9gEXWLOWwiHcc2 qyzOQMKVdJ/aTSUdMAvsBnw0K6Mja8BlStg6T5mXLReSz3aT7zvX9BFFBQ+gPd3H3wmd 3FzmI+ZpNa7uRivvnYC6xDHrQQrRuBc83NdpUroDCmtJlwXSqYpHvCoB+CoOZhTu4Tjs 2+qB710cdsu4VgtLmjjuMY3cJ64D6blyx/9OPXJa/nWPibmgt4At2juv5LmNQNkZJZgw z8ZA== Received: by 10.50.34.200 with SMTP id b8mr3033981igj.52.1354832139513; Thu, 06 Dec 2012 14:15:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.6.129 with HTTP; Thu, 6 Dec 2012 14:15:19 -0800 (PST) In-Reply-To: References: From: Harsh J Date: Fri, 7 Dec 2012 03:45:19 +0530 Message-ID: Subject: Re: DFS and the RecordReader To: "" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQkl7NNzSnd7fwp+DuH3yhXVtCCb7f01ZSxnmoNEm/bz41lDY+74A12u24TiKqOgFmdgOWUq X-Virus-Checked: Checked by ClamAV on apache.org Hi, Not sure what you're talking about. RecordReaders, or for that matter, any DFS InputStream, does not pull data locally before reading it. Non-data-local reads are streamed over the network like how regular data local reads are streamed over a local disk. There is no such logic as the one you seek. On Fri, Dec 7, 2012 at 3:07 AM, Jay Vyas wrote: > Hi guys: > > Where and how does a Hadoop's record reader decide wether or not it needs= to > copy a file to local disk ? > > Clearly, since the InputSplit (which has meta data about file inputs) is = the > input to the RecordReader, the RecordReader would have to implement some > kind of smart decision making ... Im looking for something like > > //Psuedocode > if(! file.existsLocally()) > copyFileToDisk(filegetPath()); > > return new InputStream(file); > > I've looked here: > > http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson.hadoop/h= adoop-core/0.19.1-hudson-2/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.= create%28java.lang.String%2Corg.apache.hadoop.fs.permission.FsPermission%2C= boolean%2Cshort%2Clong%2Corg.apache.hadoop.util.Progressable%2Cint%29 > > but don't see anything. > > -- > Jay Vyas > http://jayunit100.blogspot.com --=20 Harsh J