Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4C1D41122F for ; Fri, 16 May 2014 20:40:02 +0000 (UTC) Received: (qmail 98152 invoked by uid 500); 16 May 2014 20:14:51 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 41667 invoked by uid 500); 16 May 2014 19:50:03 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 42723 invoked by uid 99); 16 May 2014 19:39:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 May 2014 19:39:09 +0000 X-ASF-Spam-Status: No, hits=1.9 required=5.0 tests=HTML_MESSAGE,NO_DNS_FOR_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of john@omernik.com designates 209.85.128.171 as permitted sender) Received: from [209.85.128.171] (HELO mail-ve0-f171.google.com) (209.85.128.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 May 2014 19:32:37 +0000 Received: by mail-ve0-f171.google.com with SMTP id oz11so3648078veb.16 for ; Fri, 16 May 2014 12:32:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=ZBIX/1GEXIZSaTSfhWe8ocNoxDzRqBEHdx/ri/bymcg=; b=lEyzxMhEQ1anOZF2B1q/2abCFYdG+sZx97ymTIlAzdIDqkkPP66wJNFcQqtSzJ/RrL rX43cRBsilGexqVGXNC6ub0dIoEh1nkpp7orVFScoVjL5xI8kNm7z7KBGxbS0ShCLHYg WLB9c4x/mud/gDUI5oyk6oEkN7s6vjPUJPdpdmvL0Z4sP8vI4+ZehU84cuVh96WDaDdP OOe3HzO9erGfQ+h/+Xt3cUK42UgIpRsPQNmZi/+sieKQU5QT+pec5mPXabKOR798jSAx pqkDtkPIc0Lp8knHARr21FJrivXv+lp2K+kagHiBAi6zx4CuZ8klj2x7IHAKmPCndqXV hCiA== X-Gm-Message-State: ALoCoQk/fVnD7erFFiFzDkVcK/4v8f0Vv0vOx+Vwtd0Qwc7I5N1sqETclsT5aT1E+BgCGIippAst X-Received: by 10.221.37.1 with SMTP id tc1mr3478254vcb.32.1400268733925; Fri, 16 May 2014 12:32:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.145.2 with HTTP; Fri, 16 May 2014 12:31:52 -0700 (PDT) In-Reply-To: References: From: John Omernik Date: Fri, 16 May 2014 14:31:52 -0500 Message-ID: Subject: Re: ORC file in Hive 0.13 throws Java heap space error To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11334c3879e32b04f9897886 X-Virus-Checked: Checked by ClamAV on apache.org --001a11334c3879e32b04f9897886 Content-Type: text/plain; charset=UTF-8 When I created the table, I had to reduce the orc.compress.size quite a bit to make my table with many columns work. This was on Hive 0.12 (I thought it was supposed to be fixed on Hive 0.13, but 3k+ columns is huge) The default of orc.compress size is quite a bit larger ( think in the 268k range) Try moving that smaller and smaller if that level doesn't work. Good luck. STORED AS orc tblproperties ("orc.compress.size"="8192"); On Thu, May 15, 2014 at 8:11 PM, Premal Shah wrote: > I have a table in hive stored as text file with 3283 columns. All columns > are of string data type. > > I'm trying to convert that table into an orc file table using this command > *create table orc_table stored as orc as select * from text_table;* > > This is the setting under mapred-site.xml > > > mapred.child.java.opts > -Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc > true > > > The tasks die with this error > > 2014-05-16 00:53:42,424 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) > at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117) > at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168) > at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239) > at org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByteWriter.java:58) > at org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.java:44) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:553) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(WriterImpl.java:1455) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:221) > at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168) > at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028) > at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86) > at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) > at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) > at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > > > This is the GC output for a task that ran out of memory > > 0.690: [GC 17024K->768K(83008K), 0.0019170 secs] > 0.842: [GC 8488K(83008K), 0.0066800 secs] > 1.031: [GC 17792K->1481K(83008K), 0.0015400 secs] > 1.352: [GC 17142K(83008K), 0.0041840 secs] > 1.371: [GC 18505K->2249K(83008K), 0.0097240 secs] > 34.779: [GC 28384K(4177280K), 0.0014050 secs] > > > Anything I can tweak to make it work? > > -- > Regards, > Premal Shah. > --001a11334c3879e32b04f9897886 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
When I created the table, I had to reduce the orc.compress= .size quite a bit to make my table with many columns work. This was on Hive= 0.12 (I thought it was supposed to be fixed on Hive 0.13, but 3k+ columns = is huge) =C2=A0The default of orc.compress size is quite a bit larger ( thi= nk in the 268k range) Try moving that smaller and smaller if that level doe= sn't work. =C2=A0Good luck.=C2=A0

STORED AS orc tblproperties ("orc.compress.size"=3D= "8192");



On Thu, May 15, 2014 at 8:11 PM, Premal Shah <p= remal.j.shah@gmail.com> wrote:
I have a table in hive stor= ed as text file with 3283 columns. All columns are of string data type.
I'm trying to convert that table into an orc file table usin= g this command
create table orc_table stored as orc as select * from text_table;

This is the setting under mapred-site.xml
<= div>
<property>
=C2=A0 =C2=A0 <name&g= t;mapred.child.java.opts</name>
=C2=A0 =C2=A0 <value>-Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncr= ementalMode -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc</value>
=
=C2=A0 =C2=A0 <final>true</final>
=C2=A0 </pr= operty>

The tasks die with this error

=
2014-05-16 00:53:42,424 FATAL org.apache.ha=
doop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java h=
eap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream=
.java:117)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByt=
eWriter.java:58)
	at org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.ja=
va:44)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(Writ=
erImpl.java:553)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStrip=
e(WriterImpl.java:1012)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(=
WriterImpl.java:1455)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStrip=
e(WriterImpl.java:1400)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java=
:1780)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java=
:221)
	at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryMana=
ger.java:168)
	at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.j=
ava:157)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028=
)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(=
OrcOutputFormat.java:86)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOpera=
tor.java:622)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.=
java:87)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOpe=
rator.java:92)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540=
)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1190)

This is the GC output for a task that ran out of memory


0.690: [GC 17024K->768K(83008K), 0.0019170 =
secs]
0.842: [GC 8488K(83008K), 0.0066800 secs]
1.031: [GC 17792K->1481K(83008K), 0.0015400 secs]
1.352: [GC 17142K(83008K), 0.0041840 secs]
1.371: [GC 18505K->2249K(83008K), 0.0097240 secs]
34.779: [GC 28384K(4177280K), 0.0014050 secs]

Anything I can tweak to make it work= ?

-= -
Regards,
Premal Shah.

--001a11334c3879e32b04f9897886--