Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CEB2F9A3C for ; Mon, 9 Jan 2012 18:31:28 +0000 (UTC) Received: (qmail 59703 invoked by uid 500); 9 Jan 2012 18:31:25 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 59500 invoked by uid 500); 9 Jan 2012 18:31:24 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 59486 invoked by uid 99); 9 Jan 2012 18:31:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2012 18:31:24 +0000 X-ASF-Spam-Status: No, hits=2.1 required=5.0 tests=HK_RANDOM_ENVFROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of billmcn@gmail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bk0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2012 18:31:17 +0000 Received: by bkas6 with SMTP id s6so2366024bka.35 for ; Mon, 09 Jan 2012 10:30:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=xEQAV09x+ORYcDfgltLYa3tCxT86edMvccz+JTx/KDk=; b=YqL3MnilFb6vm5/08zjg2F2OjIo2X2IyXcZkb2+p7DN2XAHAecfkcjQbnHqhEO7aut +nM7hIgt8M6gdKiylkB22vnBxUkzu36Hx+421KnRZsOkSpzJzRbwB/FfBiRE/Xt3bzSc aNyTH+n+dJ1BgMYW0zT/LrgBLTZcT+CsFSRew= MIME-Version: 1.0 Received: by 10.205.120.135 with SMTP id fy7mr7729330bkc.54.1326133857287; Mon, 09 Jan 2012 10:30:57 -0800 (PST) Received: by 10.204.126.136 with HTTP; Mon, 9 Jan 2012 10:30:57 -0800 (PST) Date: Mon, 9 Jan 2012 10:30:57 -0800 Message-ID: Subject: Adding a soft-linked archive file to the distributed cache doesn't work as advertised From: "W.P. McNeill" To: Hadoop Mailing List Cc: Shakthi Poornima Content-Type: multipart/alternative; boundary=000e0ce0266a7d004104b61c9902 X-Virus-Checked: Checked by ClamAV on apache.org --000e0ce0266a7d004104b61c9902 Content-Type: text/plain; charset=UTF-8 I am trying to add a zip file to the distributed cache and have it unzipped on the task nodes with a softlink to the unzipped directory placed in the working directory of my mapper process. I think I'm doing everything the way the documentation tells me to, but it's not working. On the client in the run() function while I'm creating the job I first call: fs.copyFromLocalFile("gate-app.zip", "/tmp/gate-app.zip"); As expected, this copies the archive file gate-app.zip to the HDFS directory /tmp. Then I call DistributedCache.addCacheArchive("/tmp/gate-app.zip#gate-app", configuration); I expect this to add "/tmp/gate-app.zip" to the distributed cache and put a softlink to it called gate-app in the working directory of each task. However, when I call job.waitForCompletion(), I see the following error: Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/gate-app.zip#gate-app. It appears that the distributed cache mechanism is interpreting the entire URI as the literal name of the file, instead of treating the fragment as the name of the softlink. As far as I can tell, I'm doing this correctly according to the API documentation: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html . The full project in which I'm doing this is up on github: https://github.com/wpm/Hadoop-GATE. Can someone tell me what I'm doing wrong? --000e0ce0266a7d004104b61c9902--