Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E2B8AD52F for ; Wed, 24 Oct 2012 14:15:17 +0000 (UTC) Received: (qmail 70680 invoked by uid 500); 24 Oct 2012 14:15:12 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 70385 invoked by uid 500); 24 Oct 2012 14:15:12 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 70377 invoked by uid 99); 24 Oct 2012 14:15:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Oct 2012 14:15:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of acm@hortonworks.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vb0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Oct 2012 14:15:03 +0000 Received: by mail-vb0-f48.google.com with SMTP id e21so626462vbm.35 for ; Wed, 24 Oct 2012 07:14:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer:x-gm-message-state; bh=ScmWFEzKhndJaziIpWMIjI36v5hvF0CQTt1YnvNPI6o=; b=f3y2J10WliRGMyyEGmqv1PUBcB2AwhaxRvOrbePSBk/0QVsRCkBf4V8UJY60h/97Ts mPEE13tIgiFptvGE99BDGuMHVtfa1w6905+eQDR6jPldlmS1NEZ+xN+b0QZ1/YbpaDo7 SSMvl3oyqfwckxMebd14JIWAig2AzAl95roK/c3Jigq3Tt1nqrHBZTRSYZ37wqaZuKda dwjpKKF6WZdx8Wl1VzCmskCQ5kdjZ3SrfAObclBVt3j5HCMuM3rFLlmAKor0GbNdtxxU YVh+YfBc9SAyxuPtO4//bfnsu9w8YHnAxiHbRyrFVi9atqHKxd8hhkJ++3d3deaw3R3Z /sfg== Received: by 10.52.36.40 with SMTP id n8mr21269780vdj.52.1351088082426; Wed, 24 Oct 2012 07:14:42 -0700 (PDT) Received: from [10.10.27.45] ([207.41.181.226]) by mx.google.com with ESMTPS id w17sm16232511vdf.16.2012.10.24.07.14.35 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 24 Oct 2012 07:14:38 -0700 (PDT) From: Arun C Murthy Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-15--881268920 Subject: Re: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1 Date: Wed, 24 Oct 2012 10:14:45 -0400 In-Reply-To: To: user@hadoop.apache.org References: Message-Id: <8BC1A535-76DC-4511-8C26-8E94306F24A4@hortonworks.com> X-Mailer: Apple Mail (2.1084) X-Gm-Message-State: ALoCoQlfBmrImfU7AN5DVzhnZxdB/dqCfUVemt3JvEwGLb6LwoexdD7Jntp35tcNCiyYqvAueoat X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-15--881268920 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Please ask CDH questions on CDH lists.=20 On Oct 23, 2012, at 3:17 PM, Kartashov, Andy wrote: > Guys tried for hours to resolve this error. > =20 > I am trying to import a table to Hadoop using Sqoop. > =20 > ERROR is: > Error: = org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/H= sqlProperties > =20 > =20 > I realise that there is an issue with the versions of hsqldb.jar files > =20 > At first, Sqoop was spitting above error until I realised that my = /usr/lib/sqoop/lib folder had both versions hsqldb-1.8.0.10.jar and just = hsqldb.jar (2.0? I suppose), and sqoop-conf was picking up the first = (wrong jar). > =20 > When I moved the hsqldb-1.8.0.10.jar away, Sqoop stopped complaining = but them Hadoop began spitting out the same error. No matter what I = tried I could not get Hadoop to pick the right jar. > =20 > I tried setting: > export HADOOP_CLASSPATH=3D=94/usr/lib/sqoop/lib/hsqldb.jar=94 and then > export HADOOP_USER_CLASSPATH_FIRST=3Dtrue > without luck.. > =20 > Please help. > =20 > Thnks > AK > =20 > =20 > From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com]=20 > Sent: Tuesday, October 23, 2012 2:41 PM > To: user@hadoop.apache.org > Subject: Re: zlib does not uncompress gzip during MR run > =20 > Just to follow up on my own question... > =20 > I believe the problem is caused by the input split during MR. So my = real question is how to handle input splits when the input is gzipped. > =20 > Is it even possible to have splits of a gzipped file? > =20 > Thanks, > =20 > Jon >=20 > On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop = wrote: > Hi, >=20 > My input files are gzipped, and I am using the builtin java codecs = successfully to uncompress them in a normal java run... >=20 > fileIn =3D fs.open(fsplit.getPath()); > codec =3D compressionCodecs.getCodec(fsplit.getPath()); > in =3D new LineReader(codec !=3D null ? = codec.createInputStream(fileIn) : fileIn, config); >=20 > But when I use the same piece of code in a MR job I am getting... >=20 >=20 >=20 > 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop = library > 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & = initialized native-zlib library > 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor > 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table = output configured. > 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to = process : 3 > 12/10/23 11:02:27 INFO mapred.JobClient: Running job: = job_201210221549_0014 > 12/10/23 11:02:28 INFO mapred.JobClient: map 0% reduce 0% > 12/10/23 11:02:49 INFO mapred.JobClient: Task Id : = attempt_201210221549_0014_m_000003_0, Status : FAILED > java.io.IOException: incorrect header check > at = org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Nat= ive Method) > at = org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompr= essor.java:221) > at = org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt= ream.java:82) > at = org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j= ava:76) > at java.io.InputStream.read(InputStream.java:101) >=20 > So I am thinking that there is some incompatibility of zlib and my = gzip. Is there a way to force hadoop to use the java built-in = compression codecs? >=20 > Also, I would like to try lzo which I hope will allow splitting of the = input files (I recall reading this somewhere). Can someone point me to = the best way to do this? >=20 > Thanks, >=20 > Jon > =20 > NOTICE: This e-mail message and any attachments are confidential, = subject to copyright and may be privileged. Any unauthorized use, = copying or disclosure is prohibited. If you are not the intended = recipient, please delete and contact the sender immediately. Please = consider the environment before printing this e-mail. AVIS : le pr=E9sent = courriel et toute pi=E8ce jointe qui l'accompagne sont confidentiels, = prot=E9g=E9s par le droit d'auteur et peuvent =EAtre couverts par le = secret professionnel. Toute utilisation, copie ou divulgation non = autoris=E9e est interdite. Si vous n'=EAtes pas le destinataire pr=E9vu = de ce courriel, supprimez-le et contactez imm=E9diatement l'exp=E9diteur. = Veuillez penser =E0 l'environnement avant d'imprimer le pr=E9sent = courriel -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ --Apple-Mail-15--881268920 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 Please ask CDH questions on CDH = lists. 

On Oct 23, 2012, at 3:17 PM, Kartashov, = Andy wrote:

Guys tried for hours to = resolve this error.

 

I am trying to import a table to = Hadoop using Sqoop.

 

ERROR is:
Error: = org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/H= sqlProperties

 

 

I realise that there is an issue = with the versions of hsqldb.jar files

 

At first, Sqoop was spitting = above error until I realised that my /usr/lib/sqoop/lib folder had both = versions hsqldb-1.8.0.10.jar and just hsqldb.jar (2.0? I suppose), and = sqoop-conf was picking up the first (wrong jar).

 

When I moved the = hsqldb-1.8.0.10.jar away, Sqoop stopped complaining but them Hadoop = began spitting out the same error. No matter what I tried I could not = get Hadoop to pick the right jar.

 

I tried setting:
export = HADOOP_CLASSPATH=3D=94/usr/lib/sqoop/lib/hsqldb.jar=94 and = then
export = HADOOP_USER_CLASSPATH_FIRST=3Dtrue
without luck..

 

Please help.

 

Thnks
AK

 

 

From: Jonathan Bishop = [mailto:jbishop.rwc@gmail.com] 
Sent: Tuesday, October 23, 2012 = 2:41 PM
To:  
Re: zlib does not = uncompress gzip during MR run

 

Just to follow up on my own = question...

 

I believe the problem is caused by the input split = during MR. So my real question is how to handle input splits when the = input is gzipped.

 

Is it even possible to = have splits of a gzipped file?

 

Jon

On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop = <Hi,

My input files are gzipped, and I am using the = builtin java codecs successfully to uncompress them in a normal java = run...

        fileIn =3D = fs.open(fsplit.getPath());
        = codec =3D = compressionCodecs.getCodec(fsplit.getPath());
    =     in =3D new LineReader(codec !=3D null ? = codec.createInputStream(fileIn) : fileIn, config);

But when I use = the same piece of code in a MR job I am getting...


12/10/23 = 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop = library
12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded = & initialized native-zlib library
12/10/23 11:02:25 INFO = compress.CodecPool: Got brand-new compressor
12/10/23 11:02:26 INFO = mapreduce.HFileOutputFormat: Incremental table output = configured.
12/10/23 11:02:26 INFO input.FileInputFormat: Total input = paths to process : 3
12/10/23 11:02:27 INFO mapred.JobClient: Running = job: job_201210221549_0014
12/10/23 11:02:28 INFO = mapred.JobClient:  map 0% reduce 0%
12/10/23 11:02:49 INFO = mapred.JobClient: Task Id : attempt_201210221549_0014_m_000003_0, Status = : FAILED
java.io.IOException: incorrect header = check
    at = org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Nat= ive Method)
    at = org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompr= essor.java:221)
    at = org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt= ream.java:82)
    at = org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j= ava:76)
    at = java.io.InputStream.read(InputStream.java:101)

So I am thinking that there is some incompatibility of = zlib and my gzip. Is there a way to force hadoop to use the java = built-in compression codecs?

Also, I would like to try lzo which = I hope will allow splitting of the input files (I recall reading this = somewhere). Can someone point me to the best way to do = this?

Thanks,

Jon

 

NOTICE: This e-mail message and = any attachments are confidential, subject to copyright and may be = privileged. Any unauthorized use, copying or disclosure is prohibited. = If you are not the intended recipient, please delete and contact the = sender immediately. Please consider the environment before printing this = e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui = l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'auteur et = peuvent =EAtre couverts par le secret professionnel. Toute utilisation, = copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas = le destinataire pr=E9vu de ce courriel, supprimez-le et contactez = imm=E9diatement l'exp=E9diteur. Veuillez penser =E0 l'environnement = avant d'imprimer le pr=E9sent = courriel


= --Apple-Mail-15--881268920--