Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 46C64D8D5 for ; Tue, 23 Oct 2012 19:57:41 +0000 (UTC) Received: (qmail 85943 invoked by uid 500); 23 Oct 2012 19:57:36 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 85816 invoked by uid 500); 23 Oct 2012 19:57:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 85806 invoked by uid 99); 23 Oct 2012 19:57:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2012 19:57:36 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [206.47.135.205] (HELO Spam1.prd.mpac.ca) (206.47.135.205) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2012 19:57:27 +0000 Received: from Spam1.prd.mpac.ca (unknown [127.0.0.1]) by IMSVA80 (Postfix) with ESMTP id 093EB1D8054 for ; Tue, 23 Oct 2012 15:57:04 -0400 (EDT) Received: from SMAIL1.prd.mpac.ca (unknown [172.29.2.53]) by Spam1.prd.mpac.ca (Postfix) with ESMTP id A91CE1D8049 for ; Tue, 23 Oct 2012 15:57:03 -0400 (EDT) Received: from SMAIL1.prd.mpac.ca ([fe80::d548:4221:967c:4cfb]) by SMAIL1.prd.mpac.ca ([fe80::d548:4221:967c:4cfb%16]) with mapi id 14.02.0318.001; Tue, 23 Oct 2012 15:57:03 -0400 From: "Kartashov, Andy" To: "user@hadoop.apache.org" Subject: RE: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1 SOLVED! Thread-Topic: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1 SOLVED! Thread-Index: Ac2xWJFtNpNw25GSR8C4ltcVUsqLOw== Date: Tue, 23 Oct 2012 19:57:02 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.29.60.102] Content-Type: multipart/alternative; boundary="_000_BD42F346AE90F544A731516A805D1B8AD727CCSMAIL1prdmpacca_" MIME-Version: 1.0 X-TM-AS-Product-Ver: IMSVA-8.0.0.1304-6.5.0.1024-19300.001 X-TM-AS-Result: No--33.037-5.0-31-10 X-imss-scan-details: No--33.037-5.0-31-10 X-TM-AS-Result-Xfilter: Match text exemption rules:No X-TMASE-MatchedRID: 1ZHks2aQIkjao4CRCsO6Y6OuVibdZNTvG08M2I9s0LrCMA2fC1ALyD8q tc3Zp2f4A4w5hSluDx1U+6S8khnj9GSBCik5/ZoSMIiU395I8H0hmbYg1ZcOnn+0T1ghSvBimYi pEy3261sgxDbZwWHDEf7kokemY3HgPXdZx1sZHpC8coKUcaOOvR852jgffnmIfVkB/cv6Ul1BuM vAeJg95VqZDDGfpXbGdRXploLVwXcOwH4pD14DsFCxqE4whnXls+A++/BnIBFnnK6mXN72m34UO BQdfZJquaGXy8qhFDqKCXP920VqlkHQ/pYAwhs6UhrIEMB3r19YMtqMzbYRNjWWN+czjn1j5q3Y kgneJmRS2kpO6Sx15hdWtX7OQqOwDvX2Sk47fs+wHK2BMXhNNELyokP9aKIRyPRAwD/3abYxW0h VH4oM7h8U/+GdXfr54fIT61jjbJz0MLyooWuV8QDPuhU4P53K/akb2jtl/yEFXFSkfaz0ceXzAs zU1yNmpEYBIO560crniZuiwpS5h+dlXNS6RXKkZ6unGlnCDgv4bPjfm0hIQeKYpwtG+wHV8FhGj Tp5WPfXX6lVF/9rGbRd0WwZSVdNNJyAyqAmcb9B6yOrxc8xu73hJuaRJWl0bXBt+KFISTiGoSfE HPHpLHb4Bm7FqQnLXHn0K2CbIexqD9kNPFM9GDTR2TFg0xG33keXXBe59XlNPXOsf8WzRtVqfbG t9uYdZpnOW1F7m/KJ6uGtPJgUD99faxl/I4mhFEUknJ/kEl6V8bCk1I9Wfm088WbWBTaR0+fPt8 X3qsaqLipnwkbI7gSZIrWo+HAgMBCFgt+zCw+nEZKi8TmStQ== X-Virus-Checked: Checked by ClamAV on apache.org --_000_BD42F346AE90F544A731516A805D1B8AD727CCSMAIL1prdmpacca_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Found a solution after searching for hsqldb-1.8.0.10.jar and replacing them= with latest hsqldb.jar. Found them inside: /usr/lib/hadoop/client-0.20/ /usr/lib/hadoop-0.20-mapreduce/lib/ :):):) From: Kartashov, Andy [mailto:Andy.Kartashov@mpac.ca] Sent: Tuesday, October 23, 2012 3:17 PM To: user@hadoop.apache.org Subject: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1 Guys tried for hours to resolve this error. I am trying to import a table to Hadoop using Sqoop. ERROR is: Error: org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/per= sist/HsqlProperties I realise that there is an issue with the versions of hsqldb.jar files At first, Sqoop was spitting above error until I realised that my /usr/lib/= sqoop/lib folder had both versions hsqldb-1.8.0.10.jar and just hsqldb.jar = (2.0? I suppose), and sqoop-conf was picking up the first (wrong jar). When I moved the hsqldb-1.8.0.10.jar away, Sqoop stopped complaining but th= em Hadoop began spitting out the same error. No matter what I tried I could= not get Hadoop to pick the right jar. I tried setting: export HADOOP_CLASSPATH=3D"/usr/lib/sqoop/lib/hsqldb.jar" and then export HADOOP_USER_CLASSPATH_FIRST=3Dtrue without luck.. Please help. Thnks AK From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com] Sent: Tuesday, October 23, 2012 2:41 PM To: user@hadoop.apache.org Subject: Re: zlib does not uncompress gzip during MR run Just to follow up on my own question... I believe the problem is caused by the input split during MR. So my real qu= estion is how to handle input splits when the input is gzipped. Is it even possible to have splits of a gzipped file? Thanks, Jon On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop > wrote: Hi, My input files are gzipped, and I am using the builtin java codecs successf= ully to uncompress them in a normal java run... fileIn =3D fs.open(fsplit.getPath()); codec =3D compressionCodecs.getCodec(fsplit.getPath()); in =3D new LineReader(codec !=3D null ? codec.createInputStream(fil= eIn) : fileIn, config); But when I use the same piece of code in a MR job I am getting... 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop libr= ary 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized = native-zlib library 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table outpu= t configured. 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process = : 3 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014 12/10/23 11:02:28 INFO mapred.JobClient: map 0% reduce 0% 12/10/23 11:02:49 INFO mapred.JobClient: Task Id : attempt_201210221549_001= 4_m_000003_0, Status : FAILED java.io.IOException: incorrect header check at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDire= ct(Native Method) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibD= ecompressor.java:221) at org.apache.hadoop.io.compress.DecompressorStream.decompress(Decompre= ssorStream.java:82) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorSt= ream.java:76) at java.io.InputStream.read(InputStream.java:101) So I am thinking that there is some incompatibility of zlib and my gzip. Is= there a way to force hadoop to use the java built-in compression codecs? Also, I would like to try lzo which I hope will allow splitting of the inpu= t files (I recall reading this somewhere). Can someone point me to the best= way to do this? Thanks, Jon NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before pri= nting this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui= l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'auteur et peu= vent =EAtre couverts par le secret professionnel. Toute utilisation, copie = ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le desti= nataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatement l= 'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le pr= =E9sent courriel NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before pri= nting this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui= l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'auteur et peu= vent =EAtre couverts par le secret professionnel. Toute utilisation, copie = ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le desti= nataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatement l= 'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le pr= =E9sent courriel --_000_BD42F346AE90F544A731516A805D1B8AD727CCSMAIL1prdmpacca_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Found a solution after s= earching for hsqldb-1.8.0.10.jar and replacing them with latest hsqldb.jar.=

 

Found them inside:

/usr/lib/hadoop/client-= 0.20/

/usr/lib/hadoop-0.20-ma= preduce/lib/

 

JJJ

 

 

From: Kartashov, Andy [mailto:Andy.Kartashov@mpac.ca]
Sent: Tuesday, October 23, 2012 3:17 PM
To: user@hadoop.apache.org
Subject: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.= 1

 

Guys tried for hours to= resolve this error.

 

I am trying to import a= table to Hadoop using Sqoop.

 

ERROR is:

Error: org.hsqldb.Datab= aseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties

 

 

I realise that there is= an issue with the versions of hsqldb.jar files

 

At first, Sqoop was spi= tting above error until I realised that my /usr/lib/sqoop/lib folder had bo= th versions hsqldb-1.8.0.10.jar and just hsqldb.jar (2.0? I suppose), and sqoop-conf was picking up the first (wrong jar).

 

When I moved the hsqldb= -1.8.0.10.jar away, Sqoop stopped complaining but them Hadoop began spittin= g out the same error. No matter what I tried I could not get Hadoop to pick the right jar.

 

I tried setting:=

export HADOOP_CLASSPATH= =3D”/usr/lib/sqoop/lib/hsqldb.jar” and then

export HADOOP_USER_CLAS= SPATH_FIRST=3Dtrue

without luck..

 

Please help.

 

Thnks

AK

 

 

From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com]
Sent: Tuesday, October 23, 2012 2:41 PM
To: user@hadoop.apache.org
Subject: Re: zlib does not uncompress gzip during MR run

 

Just to follow up on my own question...

 

I believe the problem is caused by the input split d= uring MR. So my real question is how to handle input splits when the input = is gzipped.

 

Is it even possible to have splits of a gzipped file= ?

 

Thanks,

 

Jon

On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop &l= t;jbishop.rwc@gm= ail.com> wrote:

Hi,

My input files are gzipped, and I am using the builtin java codecs successf= ully to uncompress them in a normal java run...

        fileIn =3D fs.open(fsplit.getPath());=
        codec =3D compressionCodecs.getCodec(= fsplit.getPath());
        in =3D new LineReader(codec !=3D null= ? codec.createInputStream(fileIn) : fileIn, config);

But when I use the same piece of code in a MR job I am getting...

12/10/23 11:02:25 INF= O util.NativeCodeLoader: Loaded the native-hadoop library
12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initiali= zed native-zlib library
12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table outpu= t configured.
12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process = : 3
12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014=
12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
12/10/23 11:02:49 INFO mapred.JobClient: Task Id : attempt_201210221549_001= 4_m_000003_0, Status : FAILED
java.io.IOException: incorrect header check
    at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.i= nflateBytesDirect(Native Method)
    at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.d= ecompress(ZlibDecompressor.java:221)
    at org.apache.hadoop.io.compress.DecompressorStream.deco= mpress(DecompressorStream.java:82)
    at org.apache.hadoop.io.compress.DecompressorStream.read= (DecompressorStream.java:76)
    at java.io.InputStream.read(InputStream.java:101)

So I am thinking that there is some incompatibility = of zlib and my gzip. Is there a way to force hadoop to use the java built-i= n compression codecs?

Also, I would like to try lzo which I hope will allow splitting of the inpu= t files (I recall reading this somewhere). Can someone point me to the best= way to do this?

Thanks,

Jon

 

NOTICE: This e-mail message and any attachments are = confidential, subject to copyright and may be privileged. Any unauthorized = use, copying or disclosure is prohibited. If you are not the intended recip= ient, please delete and contact the sender immediately. Please consider the environment before printing this e= -mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui l'accompagn= e sont confidentiels, prot=E9g=E9s par le droit d'auteur et peuvent =EAtre = couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris=E9e est interdite. Si vous n= '=EAtes pas le destinataire pr=E9vu de ce courriel, supprimez-le et contact= ez imm=E9diatement l'exp=E9diteur. Veuillez penser =E0 l'environnement avan= t d'imprimer le pr=E9sent courriel

NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr= =E9sent courriel et toute pi=E8ce jointe qui l'accompagne sont confidentiel= s, prot=E9g=E9s par le droit d'auteur et peuvent =EAtre couverts par le sec= ret professionnel. Toute utilisation, copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le dest= inataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatement = l'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le pr= =E9sent courriel --_000_BD42F346AE90F544A731516A805D1B8AD727CCSMAIL1prdmpacca_--