Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 55071 invoked from network); 2 Sep 2010 10:59:00 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Sep 2010 10:59:00 -0000 Received: (qmail 34057 invoked by uid 500); 2 Sep 2010 10:59:00 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 33786 invoked by uid 500); 2 Sep 2010 10:58:58 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 33776 invoked by uid 99); 2 Sep 2010 10:58:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Sep 2010 10:58:57 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.52.164.187] (HELO outbound.mse12.exchange.ms) (216.52.164.187) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Sep 2010 10:58:51 +0000 Received: from Robert-Hennigs-MacBook-Pro.local ([188.104.99.248]) by outbound.mse12.exchange.ms with Microsoft SMTPSVC(6.0.3790.4675); Thu, 2 Sep 2010 06:58:25 -0400 Message-ID: <4C7F8350.7070407@adconion.com> Date: Thu, 02 Sep 2010 12:58:24 +0200 From: Robert Hennig User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; de; rv:1.9.2.8) Gecko/20100802 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: hive-user@hadoop.apache.org Subject: Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver References: <4C7E5181.5010406@adconion.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------010200030602040205010802" X-OriginalArrivalTime: 02 Sep 2010 10:58:26.0511 (UTC) FILETIME=[C4F02DF0:01CB4A8D] This is a multi-part message in MIME format. --------------010200030602040205010802 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello, Thanks Shirjeet for your answer. I found an execption in a task log which results from a casting error: Caused by: java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to com.adconion.hadoop.hive.DataLogSplit at com.adconion.hadoop.hive.DataLogInputFormat.getRecordReader(DataLogInputFormat.java:112) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:61) ... 11 more The error happened because I expected my custom getSlices() method to be used which delivers an array of DataLogSplit objects and I expected that my custom getRecordReader() method will receive one of this splits which then could be casted to be a DataLog split. So this looks like my getSplits() method is not being used. Or does hadoop transform the splits somehow? Thanks, Robert Am 01.09.10 18:34, schrieb Shrijeet Paliwal: > > Ended Job = job_201008311250_0006 with errors > > Check your hadoop task logs, you will find more detailed information > there. > > -Shirjeet > > On Wed, Sep 1, 2010 at 6:13 AM, Robert Hennig > wrote: > > Hello, > > I'm relative new to hive & hadoop and I have written a custom > InputFormat to be able to read our logfiles. I think I got > everything right but when I try to execute a query on a Amazon EMR > cluster it fails with some error messages that don't tell me what > exactly is wrong. > > So this is the query I execute: > > add jar s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar; > add jar s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar; > > DROP TABLE event_log; > > CREATE EXTERNAL TABLE IF NOT EXISTS event_log ( > EVENT_SUBTYPE STRING, > EVENT_TYPE STRING > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' > STORED AS > INPUTFORMAT 'com.adconion.hadoop.hive.DataLogInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 's3://amg-events/2010/07/01/01'; > > SELECT event_type FROM event_log WHERE event_type = 'pp' LIMIT 10; > > Which results in the following output: > > hadoop@domU-12-31-39-0F-45-B3:~$ hive -f test.ql > Hive history > file=/mnt/var/lib/hive/tmp/history/hive_job_log_hadoop_201009011303_427866099.txt > Testing s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar > converting to local s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar > Added > /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hive-json-serde-0.1.jar > to class path > Testing s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar > converting to local > s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar > Added > /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hadoop-jar-with-dependencies.jar > to class path > OK > Time taken: 2.426 seconds > Found class for org.apache.hadoop.hive.contrib.serde2.JsonSerde > OK > Time taken: 0.332 seconds > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_201008311250_0006, Tracking URL = > http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/jobdetails.jsp?jobid=job_201008311250_0006 > Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job > -Dmapred.job.tracker=domU-12-31-39-0F-45-B3.compute-1.internal:9001 -kill > job_201008311250_0006 > 2010-09-01 13:04:04,376 Stage-1 map = 0%, reduce = 0% > 2010-09-01 13:04:34,681 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201008311250_0006 with errors > > Failed tasks with most(4) failures : > Task URL: > http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/taskdetails.jsp?jobid=job_201008311250_0006&tipid=task_201008311250_0006_m_000013 > > > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.ExecDriver > > Only errors I can find under /mnt/var/log/apps/hive.log are > multiple like that one: > > 2010-09-01 13:03:36,586 DEBUG org.apache.hadoop.conf.Configuration > (Configuration.java:(216)) - java.io.IOException: config() > at > org.apache.hadoop.conf.Configuration.(Configuration.java:216) > at > org.apache.hadoop.conf.Configuration.(Configuration.java:203) > at > org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:316) > at > org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:232) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > And those errors: > > 2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" > requires "org.eclipse.core.resources" but it cannot be resolved. > 2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" > requires "org.eclipse.core.resources" but it cannot be resolved. > 2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" > requires "org.eclipse.core.runtime" but it cannot be resolved. > 2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" > requires "org.eclipse.core.runtime" but it cannot be resolved. > 2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" > requires "org.eclipse.text" but it cannot be resolved. > 2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" > requires "org.eclipse.text" but it cannot be resolved. > > Does anyone have an Idea what wents wrong? > > Thanks! > > Robert > > --------------010200030602040205010802 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello,

Thanks Shirjeet for your answer. I found an execption in a task log which results from a casting error:

Caused by: java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to com.adconion.hadoop.hive.DataLogSplit
    at com.adconion.hadoop.hive.DataLogInputFormat.getRecordReader(DataLogInputFormat.java:112)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:61)
    ... 11 more

The error happened because I expected my custom getSlices() method to be used which delivers an array of DataLogSplit objects and I expected that my custom getRecordReader() method will receive one of this splits which then could be casted to be a DataLog split.

So this looks like my getSplits() method is not being used. Or does hadoop transform the splits somehow?

Thanks,

Robert

Am 01.09.10 18:34, schrieb Shrijeet Paliwal:
Ended Job = job_201008311250_0006 with errors
Check your hadoop task logs, you will find more detailed information there. 

-Shirjeet

On Wed, Sep 1, 2010 at 6:13 AM, Robert Hennig <rhennig@adconion.com> wrote:
Hello,

I'm relative new to hive & hadoop and I have written a custom InputFormat to be able to read our logfiles. I think I got everything right but when I try to execute a query on a Amazon EMR cluster it fails with some error messages that don't tell me what exactly is wrong.

So this is the query I execute:

add jar s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar;
add jar s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar;

DROP TABLE event_log;

CREATE EXTERNAL TABLE IF NOT EXISTS event_log (
    EVENT_SUBTYPE STRING,
    EVENT_TYPE STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
STORED AS
INPUTFORMAT 'com.adconion.hadoop.hive.DataLogInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://amg-events/2010/07/01/01';

SELECT event_type FROM event_log WHERE event_type = 'pp' LIMIT 10;

Which results in the following output:

hadoop@domU-12-31-39-0F-45-B3:~$ hive -f test.ql
Hive history file=/mnt/var/lib/hive/tmp/history/hive_job_log_hadoop_201009011303_427866099.txt
Testing s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
converting to local s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
Added /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hive-json-serde-0.1.jar to class path
Testing s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
converting to local s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
Added /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hadoop-jar-with-dependencies.jar to class path
OK
Time taken: 2.426 seconds
Found class for org.apache.hadoop.hive.contrib.serde2.JsonSerde
OK
Time taken: 0.332 seconds
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201008311250_0006, Tracking URL = http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/jobdetails.jsp?jobid=job_201008311250_0006
Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job  -Dmapred.job.tracker=domU-12-31-39-0F-45-B3.compute-1.internal:9001 -kill job_201008311250_0006
2010-09-01 13:04:04,376 Stage-1 map = 0%,  reduce = 0%
2010-09-01 13:04:34,681 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201008311250_0006 with errors

Failed tasks with most(4) failures :
Task URL: http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/taskdetails.jsp?jobid=job_201008311250_0006&tipid=task_201008311250_0006_m_000013

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver

Only errors I can find under /mnt/var/log/apps/hive.log are multiple like that one:

2010-09-01 13:03:36,586 DEBUG org.apache.hadoop.conf.Configuration (Configuration.java:<init>(216)) - java.io.IOException: config()
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:216)
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:203)
        at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:316)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:232)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

And those errors:

2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.

Does anyone have an Idea what wents wrong?

Thanks!

Robert


--------------010200030602040205010802--