Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 747D510246 for ; Mon, 12 Aug 2013 05:04:18 +0000 (UTC) Received: (qmail 44586 invoked by uid 500); 12 Aug 2013 05:04:12 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 44211 invoked by uid 500); 12 Aug 2013 05:04:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 19010 invoked by uid 99); 12 Aug 2013 03:29:13 -0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sarkar.abhijit@outlook.com designates 65.54.190.84 as permitted sender) X-TMN: [qg3dGZLW8AloTE1Hbe6W9IZsY89IBmO5] X-Originating-Email: [sarkar.abhijit@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_ba3a8390-01c6-4908-8f0b-b4248a115f66_" From: Abhijit Sarkar Sender: To: "user@hadoop.apache.org" Subject: RE: FileNotFoundException trying to uncompress local cache archive Date: Sun, 11 Aug 2013 23:28:47 -0400 Importance: Normal In-Reply-To: References: MIME-Version: 1.0 X-OriginalArrivalTime: 12 Aug 2013 03:28:47.0129 (UTC) FILETIME=[0E0A7090:01CE970C] X-Virus-Checked: Checked by ClamAV on apache.org --_ba3a8390-01c6-4908-8f0b-b4248a115f66_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Can someone please advise? > From: abhijit.sarcar@gmail.com > To: user@hadoop.apache.org > Subject: FileNotFoundException trying to uncompress local cache archive > Date: Sun=2C 11 Aug 2013 11:43:02 -0400 >=20 > Hi=2C > As a learning exercise for myself=2C I'm receiving a simple text file URI= as an argument=2C compressing it using GzipCodec and placing it in the Dis= tributed Cache. In the Reducer=2C I'm retrieving the archive=2C uncompressi= ng it and process the text file. Well=2C at least that's the idea. My uncom= pression code is unable to find the local cache archive and throws FileNotF= oundException.=20 > I'm not using any GenericOptionsParser features like -copyFromLocal and t= rying to keep it all in the code. >=20 > Driver: > public int run(String[] args) throws Exception { > Configuration conf =3D getConf()=3B >=20 > final URI compressedFileURI =3D compressFile(new Path(args[2]).toUri()=2C= "gzip"=2C conf)=3B //implementation later >=20 > DistributedCache.addCacheArchive(compressedFileURI=2C conf)=3B >=20 > Reducer: > final Path[] cacheFiles =3D DistributedCache.getLocalCacheArchives(conf)= =3B >=20 > // some sanity check code > cacheFileURI =3D uncompressFile(cacheFiles[0].toUri()=2C conf)=3B //imple= mentation later >=20 > Utility: > public static URI compressFile(final URI uncompressedURI=2C > final String codecName=2C final Configuration conf) > throws IOException { > final FileSystem fs =3D FileSystem.get(conf)=3B > final CompressionCodec codec =3D new GzipCodec()=3B > final Path uncompressedPath =3D new Path(uncompressedURI)=3B >=20 > String archiveName =3D addExtension(uncompressedPath.getName()=2C > codec.getDefaultExtension()=2C true)=3B >=20 > final Path archivePath =3D new Path(uncompressedPath.getParent()=2C > archiveName)=3B >=20 > final OutputStream outputStream =3D new FileOutputStream(archivePath > .toUri().getPath())=3B > final InputStream inputStream =3D new FileInputStream( > uncompressedURI.getPath())=3B > final CompressionOutputStream out =3D codec > .createOutputStream(outputStream)=3B > org.apache.hadoop.io.IOUtils.copyBytes(inputStream=2C out=2C conf=2C fal= se)=3B > // clean up >=20 > public static URI uncompressFile(final URI archiveURI=2C > final Configuration conf) throws IOException { > final Path archivePath =3D new Path(archiveURI)=3B >=20 > final FileSystem fs =3D FileSystem.get(conf)=3B >=20 > final CompressionCodec codec =3D new CompressionCodecFactory(conf) > .getCodec(archivePath)=3B > final Path uncompressedPath =3D new Path( > CompressionCodecFactory.removeSuffix(archiveURI.getPath()=2C > codec.getDefaultExtension()))=3B > =09 > final OutputStream outputStream =3D fs.create(uncompressedPath)=3B >=20 > //FileNotFoundException > final InputStream inputStream =3D new FileInputStream( > archiveURI.getPath())=3B >=20 > Regards=2C > Abhijit =20 = --_ba3a8390-01c6-4908-8f0b-b4248a115f66_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Can someone please advise?

>=3B From: abhi= jit.sarcar@gmail.com
>=3B To: user@hadoop.apache.org
>=3B Subject= : FileNotFoundException trying to uncompress local cache archive
>=3B = Date: Sun=2C 11 Aug 2013 11:43:02 -0400
>=3B
>=3B Hi=2C
>= =3B As a learning exercise for myself=2C I'm receiving a simple text file U= RI as an argument=2C compressing it using GzipCodec and placing it in the D= istributed Cache. In the Reducer=2C I'm retrieving the archive=2C uncompres= sing it and process the text file. Well=2C at least that's the idea. My unc= ompression code is unable to find the local cache archive and throws FileNo= tFoundException. =3B
>=3B I'm not using any GenericOptionsParser f= eatures like -copyFromLocal and trying to keep it all in the code.
>= =3B
>=3B Driver:
>=3B public int run(String[] args) throws Excep= tion {
>=3B Configuration conf =3D getConf()=3B
>=3B
>=3B f= inal URI compressedFileURI =3D compressFile(new Path(args[2]).toUri()=2C&nb= sp=3B"gzip"=2C conf)=3B //implementation later
>=3B
>=3B Distrib= utedCache.addCacheArchive(compressedFileURI=2C conf)=3B
>=3B
>= =3B Reducer:
>=3B final Path[] cacheFiles =3D DistributedCache.getLoca= lCacheArchives(conf)=3B
>=3B
>=3B // some sanity check code
&= gt=3B cacheFileURI =3D uncompressFile(cacheFiles[0].toUri()=2C conf)=3B&nbs= p=3B//implementation later
>=3B
>=3B Utility:
>=3B public s= tatic URI compressFile(final URI uncompressedURI=2C
>=3B final Strin= g codecName=2C final Configuration conf)
>=3B throws IOException {>=3B  =3B  =3B  =3B  =3B final FileSystem fs =3D FileSy= stem.get(conf)=3B
>=3B final CompressionCodec codec =3D new GzipCodec= ()=3B
>=3B final Path uncompressedPath =3D new Path(uncompressedURI)= =3B
>=3B
>=3B String archiveName =3D addExtension(uncompressedP= ath.getName()=2C
>=3B codec.getDefaultExtension()=2C true)=3B
&g= t=3B
>=3B final Path archivePath =3D new Path(uncompressedPath.getPa= rent()=2C
>=3B archiveName)=3B
>=3B
>=3B final OutputSt= ream outputStream =3D new FileOutputStream(archivePath
>=3B .toUri(= ).getPath())=3B
>=3B final InputStream inputStream =3D new FileInputS= tream(
>=3B uncompressedURI.getPath())=3B
>=3B final Compress= ionOutputStream out =3D codec
>=3B .createOutputStream(outputStream= )=3B
>=3B org.apache.hadoop.io.IOUtils.copyBytes(inputStream=2C out= =2C conf=2C false)=3B
>=3B  =3B  =3B  =3B  =3B // clea= n up
>=3B
>=3B public static URI uncompressFile(final URI archiv= eURI=2C
>=3B final Configuration conf) throws IOException {
>= =3B final Path archivePath =3D new Path(archiveURI)=3B
>=3B
>= =3B final FileSystem fs =3D FileSystem.get(conf)=3B
>=3B
>=3B = final CompressionCodec codec =3D new CompressionCodecFactory(conf)
>= =3B .getCodec(archivePath)=3B
>=3B final Path uncompressedPath =3D= new Path(
>=3B CompressionCodecFactory.removeSuffix(archiveURI.get= Path()=2C
>=3B codec.getDefaultExtension()))=3B
>=3B
&g= t=3B final OutputStream outputStream =3D fs.create(uncompressedPath)=3B>=3B
>=3B //FileNotFoundException
>=3B  =3B  =3B &nb= sp=3B  =3B final InputStream inputStream =3D new FileInputStream(
&g= t=3B archiveURI.getPath())=3B
>=3B
>=3B Regards=2C
>=3B = Abhijit
= --_ba3a8390-01c6-4908-8f0b-b4248a115f66_--