Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C61A10483 for ; Sun, 11 Aug 2013 15:23:10 +0000 (UTC) Received: (qmail 61580 invoked by uid 500); 11 Aug 2013 15:23:04 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 61513 invoked by uid 500); 11 Aug 2013 15:23:04 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 61506 invoked by uid 99); 11 Aug 2013 15:23:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Aug 2013 15:23:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.223.181 as permitted sender) Received: from [209.85.223.181] (HELO mail-ie0-f181.google.com) (209.85.223.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Aug 2013 15:23:00 +0000 Received: by mail-ie0-f181.google.com with SMTP id x14so6822137ief.40 for ; Sun, 11 Aug 2013 08:22:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=JTNs7oQ8GfgbcyWTqm3CkjfZhFbJh4iXdM4mQ6TZ5Fw=; b=lgLax5dt7ZRt41ldhHnjrAPUiuO1TNi1ZhzuZPC8YCIU9OBD9py12+/n3NYYhBuEZ2 rvX/mVBilR57/EJC6qf2d6hDzjynT4LXgeOfVntDI3ACVo2D+Vsv1TS7NCuc+9kX9Ye/ ViDw/ckMhlCfjTG6FGY5crbOtmJI2OXL/LK3G/wE/cg0LbaVUBz9kOsV0VT4/Cscv5YS ghC5DIjiCH2+sVgyJqCMRPsI3PF4wP0fd94QTh1DnW1z+ZoactxJZDmd43y0YF1eMOlX jou1444PoYOchHE1OLJG/1unRPigbyih3MUugbwrGj8EjmK12itynUDeXklURb6soxyN MOpw== X-Gm-Message-State: ALoCoQnx9IFrBV1EpO0mPoyE1YqbiVbFMq3/MVZPjK3qso7dPyqakHVfqk1DJxGfV1648+futIJO X-Received: by 10.43.78.196 with SMTP id zn4mr8112369icb.55.1376234559636; Sun, 11 Aug 2013 08:22:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.95.199 with HTTP; Sun, 11 Aug 2013 08:22:19 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Sun, 11 Aug 2013 20:52:19 +0530 Message-ID: Subject: Re: How to compress MapFile programmatically To: "" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org A MapFile.Reader will automatically detect and decompress without needing to be told anything special. You needn't have to worry about decompressing files by yourself in Apache Hadoop generally - the framework handles it for you transparently if you're using the proper APIs. On Sun, Aug 11, 2013 at 8:49 PM, Abhijit Sarkar wrote: > Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer > Constructor option and then put it in a DistributedCache, how do I > uncompress it in the Map/Reduce? There isn't any API method to do that > apparently. > > Regards, > Abhijit > >> From: harsh@cloudera.com >> Date: Sun, 11 Aug 2013 12:56:43 +0530 >> Subject: Re: How to compress MapFile programmatically >> To: user@hadoop.apache.org > >> >> A MapFile isn't a directory. It is a directory _containing_ two files. >> You cannot "open" a directory for reading. >> >> The MapFile API is documented at >> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.ht= ml >> and thats what you're to be using for reading/writing them. >> >> Compression is a simple option you need to provide when invoking the >> writer: >> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Wr= iter.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache= .hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableC= omparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.Compressi= onType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hado= op.util.Progressable) >> >> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar >> wrote: >> > Hi, >> > I'm a Hadoop newbie. This is my first question to this mailing list, >> > hoping >> > for a good start :) >> > >> > MapFile is a directory so when I try to open an InputStream to it, it >> > fails >> > with FileNotFoundException. How do I compress MapFile programmatically= ? >> > >> > Code snippet: >> > final FileSystem fs =3D FileSystem.get(conf); >> > final InputStream inputStream =3D fs.open(new Path(uncompressedStr)); >> > >> > Exception: >> > java.io.FileNotFoundException: /some/directory (No such file or >> > directory) >> > at java.io.FileInputStream.open(Native Method) >> > at java.io.FileInputStream.(FileInputStream.java:120) >> > at >> > >> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.= (RawLocalFileSystem.java:71) >> > at >> > >> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(= RawLocalFileSystem.java:107) >> > at >> > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:1= 77) >> > at >> > >> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(= ChecksumFileSystem.java:126) >> > at >> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:2= 83) >> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) >> > at >> > name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:10= 4) >> > >> > Regards, >> > Abhijit >> >> >> >> -- >> Harsh J --=20 Harsh J