hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Marron <Peter.Mar...@trilliumsoftware.com>
Subject Create Index Map/Reduce failure
Date Sat, 27 Oct 2012 19:21:26 GMT
Hi,

I have a fairly low-end machine running Ubuntu 12.0.4
I'm running Hadoop in pseudo-distributed and storing
in HDFS. I have a file which is 137Gb with 36.6 million rows
and 466 columns.

I am trying to create an index on this table in hive with these commands.
(I seem to have to build the index in two separate commands.)

LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;

CREATE INDEX bigIndex
ON TABLE score(Ath_Seq_Num)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD;

ALTER INDEX bigIndex ON score REBUILD;

The resulting Map/Reduce is failing with OutOfMemoryError.

I attach the end of the only log which seems to contain any
useful information about the error.
When I googled a bit I found a suggestion that
it could be the mapred.child.java.opts, so I added this to my mapred-site.xml
(and it increased the maximum from 200Mb to 1000Mb)

                <property>
                                <name>mapred.child.java.opts</name>
                                <value>-Xmx1000m</value>
                </property>

But this didn't seem to help.
I also saw some mention that I should decrease the io.sort.mb,
and so I reduced this to 1Mb. However this didn't seem to help either.

Maybe this is the wrong list for this question
and I should post to common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>?

Any help appreciated.

Peter Marron

2012-10-25 15:55:27,429 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete:
511 files left.
2012-10-25 15:55:27,432 WARN org.apache.hadoop.fs.FileSystem: "localhost" is a deprecated
filesystem name. Use "hdfs://localhost/" instead.
2012-10-25 15:55:27,449 INFO org.apache.hadoop.mapred.Merger: Merging 511 sorted segments
2012-10-25 15:55:27,455 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 511 segments left of total size: 173620406 bytes
2012-10-25 15:55:27,885 INFO org.apache.hadoop.mapred.ReduceTask: Merged 511 segments, 173620406
bytes to disk to satisfy reduce memory limit
2012-10-25 15:55:27,885 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 files, 173619390
bytes from disk
2012-10-25 15:55:27,886 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes
from memory into reduce
2012-10-25 15:55:27,886 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
2012-10-25 15:55:27,888 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 1 segments left of total size: 173619386 bytes
2012-10-25 15:55:27,895 INFO ExecReducer: maximum memory = 932118528
2012-10-25 15:55:27,895 INFO ExecReducer: conf classpath = [file:/data/tmp/mapred/local/taskTracker/pmarron/jobcache/job_201210251304_0001/jars/classes,
file:/data/tmp/mapred/local/taskTracker/pmarron/jobcache/job_201210251304_0001/jars/, file:/data/tmp/mapred/local/taskTracker/pmarron/jobcache/job_201210251304_0001/attempt_201210251304_0001_r_000093_3/]
2012-10-25 15:55:27,896 INFO ExecReducer: thread classpath = [file:/data/hadoop-1.0.3/conf/,
file:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/tools.jar, file:/data/hadoop-1.0.3/, file:/data/hadoop-1.0.3/hadoop-core-1.0.3.jar,
file:/data/hadoop-1.0.3/lib/asm-3.2.jar, file:/data/hadoop-1.0.3/lib/aspectjrt-1.6.5.jar,
file:/data/hadoop-1.0.3/lib/aspectjtools-1.6.5.jar, file:/data/hadoop-1.0.3/lib/commons-beanutils-1.7.0.jar,
file:/data/hadoop-1.0.3/lib/commons-beanutils-core-1.8.0.jar, file:/data/hadoop-1.0.3/lib/commons-cli-1.2.jar,
file:/data/hadoop-1.0.3/lib/commons-codec-1.4.jar, file:/data/hadoop-1.0.3/lib/commons-collections-3.2.1.jar,
file:/data/hadoop-1.0.3/lib/commons-configuration-1.6.jar, file:/data/hadoop-1.0.3/lib/commons-daemon-1.0.1.jar,
file:/data/hadoop-1.0.3/lib/commons-digester-1.8.jar, file:/data/hadoop-1.0.3/lib/commons-el-1.0.jar,
file:/data/hadoop-1.0.3/lib/commons-httpclient-3.0.1.jar, file:/data/hadoop-1.0.3/lib/commons-io-2.1.jar,
file:/data/hadoop-1.0.3/lib/commons-lang-2.4.jar, file:/data/hadoop-1.0.3/lib/commons-logging-1.1.1.jar,
file:/data/hadoop-1.0.3/lib/commons-logging-api-1.0.4.jar, file:/data/hadoop-1.0.3/lib/commons-math-2.1.jar,
file:/data/hadoop-1.0.3/lib/commons-net-1.4.1.jar, file:/data/hadoop-1.0.3/lib/core-3.1.1.jar,
file:/data/hadoop-1.0.3/lib/hadoop-capacity-scheduler-1.0.3.jar, file:/data/hadoop-1.0.3/lib/hadoop-fairscheduler-1.0.3.jar,
file:/data/hadoop-1.0.3/lib/hadoop-thriftfs-1.0.3.jar, file:/data/hadoop-1.0.3/lib/hsqldb-1.8.0.10.jar,
file:/data/hadoop-1.0.3/lib/jackson-core-asl-1.8.8.jar, file:/data/hadoop-1.0.3/lib/jackson-mapper-asl-1.8.8.jar,
file:/data/hadoop-1.0.3/lib/jasper-compiler-5.5.12.jar, file:/data/hadoop-1.0.3/lib/jasper-runtime-5.5.12.jar,
file:/data/hadoop-1.0.3/lib/jdeb-0.8.jar, file:/data/hadoop-1.0.3/lib/jersey-core-1.8.jar,
file:/data/hadoop-1.0.3/lib/jersey-json-1.8.jar, file:/data/hadoop-1.0.3/lib/jersey-server-1.8.jar,
file:/data/hadoop-1.0.3/lib/jets3t-0.6.1.jar, file:/data/hadoop-1.0.3/lib/jetty-6.1.26.jar,
file:/data/hadoop-1.0.3/lib/jetty-util-6.1.26.jar, file:/data/hadoop-1.0.3/lib/jsch-0.1.42.jar,
file:/data/hadoop-1.0.3/lib/junit-4.5.jar, file:/data/hadoop-1.0.3/lib/kfs-0.2.2.jar, file:/data/hadoop-1.0.3/lib/log4j-1.2.15.jar,
file:/data/hadoop-1.0.3/lib/mockito-all-1.8.5.jar, file:/data/hadoop-1.0.3/lib/oro-2.0.8.jar,
file:/data/hadoop-1.0.3/lib/servlet-api-2.5-20081211.jar, file:/data/hadoop-1.0.3/lib/slf4j-api-1.4.3.jar,
file:/data/hadoop-1.0.3/lib/slf4j-log4j12-1.4.3.jar, file:/data/hadoop-1.0.3/lib/xmlenc-0.52.jar,
file:/data/hadoop-1.0.3/lib/jsp-2.1/jsp-2.1.jar, file:/data/hadoop-1.0.3/lib/jsp-2.1/jsp-api-2.1.jar,
file:/data/tmp/mapred/local/taskTracker/pmarron/jobcache/job_201210251304_0001/jars/classes,
file:/data/tmp/mapred/local/taskTracker/pmarron/jobcache/job_201210251304_0001/jars/, file:/data/tmp/mapred/local/taskTracker/pmarron/distcache/3928617505704526765_23348021_405451127/localhost/data/tmp/mapred/staging/pmarron/.staging/job_201210251304_0001/libjars/hive-builtins-0.8.1.jar/,
file:/data/tmp/mapred/local/taskTracker/pmarron/jobcache/job_201210251304_0001/attempt_201210251304_0001_r_000093_3/work/]
2012-10-25 15:55:27,916 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found
on CLASSPATH
2012-10-25 15:55:28,510 INFO ExecReducer:
<GBY>Id =6
  <Children>
    <SEL>Id =5
      <Children>
        <FS>Id =4
          <Parent>Id = 5 null<\Parent>
        <\FS>
      <\Children>
      <Parent>Id = 6 null<\Parent>
    <\SEL>
  <\Children>
<\GBY>
2012-10-25 15:55:28,510 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Initializing
Self 6 GBY
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Operator 6 GBY
initialized
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Initializing
children of 6 GBY
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing child
5 SEL
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
5 SEL
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col0:string,_col1:string,_col2:array<bigint>>
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Operator 5 SEL
initialized
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children
of 5 SEL
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing
child 4 FS
2012-10-25 15:55:28,522 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing
Self 4 FS
2012-10-25 15:55:28,575 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 4 FS
initialized
2012-10-25 15:55:28,576 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization
Done 4 FS
2012-10-25 15:55:28,576 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization
Done 5 SEL
2012-10-25 15:55:28,576 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Initialization
Done 6 GBY
2012-10-25 15:55:28,579 WARN org.apache.hadoop.fs.FileSystem: "localhost" is a deprecated
filesystem name. Use "hdfs://localhost/" instead.
2012-10-25 15:55:28,599 INFO ExecReducer: ExecReducer: processing 1 rows: used memory = 133032712
2012-10-25 15:55:28,916 INFO ExecReducer: ExecReducer: processing 10 rows: used memory = 184539664
2012-10-25 15:55:40,041 INFO ExecReducer: ExecReducer: processing 100 rows: used memory =
396337056
2012-10-25 15:56:09,735 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-10-25 15:56:09,802 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for
UID to User mapping with a cache timeout of 14400 seconds.
2012-10-25 15:56:09,802 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName pmarron
for UID 1000 from the native implementation
2012-10-25 15:56:09,804 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError:
Java heap space
    at java.util.HashMap.resize(HashMap.java:559)
    at java.util.HashMap.addEntry(HashMap.java:851)
    at java.util.HashMap.put(HashMap.java:484)
    at java.util.HashSet.add(HashSet.java:217)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet$GenericUDAFMkSetEvaluator.putIntoSet(GenericUDAFCollectSet.java:163)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet$GenericUDAFMkSetEvaluator.merge(GenericUDAFCollectSet.java:146)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:142)
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:600)
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:824)
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:724)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)


Mime
View raw message