Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 306FB110B6 for ; Mon, 19 May 2014 20:50:53 +0000 (UTC) Received: (qmail 10708 invoked by uid 500); 19 May 2014 20:50:51 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 10651 invoked by uid 500); 19 May 2014 20:50:51 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 10643 invoked by uid 99); 19 May 2014 20:50:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 20:50:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of premal.j.shah@gmail.com designates 209.85.160.173 as permitted sender) Received: from [209.85.160.173] (HELO mail-yk0-f173.google.com) (209.85.160.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 20:50:48 +0000 Received: by mail-yk0-f173.google.com with SMTP id 142so4938010ykq.4 for ; Mon, 19 May 2014 13:50:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=VLremGguLDOgzkPjGOXyfpFzOB72kiVq6PPmzfyxnII=; b=uGEDt38Q5H6C2dlw9Ey2FAelmI7kvBG5waED2t2lhmQGfD8yPi9tjKgdKH0dXmtts1 h+Bp/ZOgmoy6MRz/aX92F3F85sbtMCUaOpJI5BsII2i6T/hpoifgy8067rNg2PXjk5IN FIttdPywyWq7U0Z5gMkyTrAa31w/yf5NDTtJw2r+MUVnFNLuN3TRozKDIyfTAql+3/Eg B9hZCJ29KXOeOhfQjEbT8zu8AnwIpARCr3pK2X94crr3xlxwCnSkrLpJlrecbWrDI4uy V6JO8SQbom29E6KC/sQfk5RFgP1t1jw2h32zl+3uKLKRx9GAsAm6YIt8XmB+HXKlMV8q 3acQ== MIME-Version: 1.0 X-Received: by 10.236.192.73 with SMTP id h49mr14264497yhn.6.1400532624500; Mon, 19 May 2014 13:50:24 -0700 (PDT) Received: by 10.170.53.139 with HTTP; Mon, 19 May 2014 13:50:24 -0700 (PDT) In-Reply-To: <5300D8DD-F3FA-45AB-A274-19B20FA9AE11@hortonworks.com> References: <5300D8DD-F3FA-45AB-A274-19B20FA9AE11@hortonworks.com> Date: Mon, 19 May 2014 13:50:24 -0700 Message-ID: Subject: Re: ORC file in Hive 0.13 throws Java heap space error From: Premal Shah To: user@hive.apache.org Content-Type: multipart/alternative; boundary=bcaec508f476946cdd04f9c6e98f X-Virus-Checked: Checked by ClamAV on apache.org --bcaec508f476946cdd04f9c6e98f Content-Type: text/plain; charset=UTF-8 Thanx for the response guys. I tried a few different compression sizes and all of them did not work. I guess our use-case is not a good candidate for orc or parquet (which I tried too and it failed) We will use some other file type. Thanx again. On Fri, May 16, 2014 at 2:26 PM, Prasanth Jayachandran < pjayachandran@hortonworks.com> wrote: > With Hive 0.13 the ORC memory issue is mitigated because of this > optimization https://issues.apache.org/jira/browse/HIVE-6455. This > optimization is enabled by default. > But having 3283 columns is still huge. So I would still recommend reducing > the default compression (256KB) buffer size to a lower value as suggested > by John. > > Thanks > Prasanth Jayachandran > > On May 16, 2014, at 12:31 PM, John Omernik wrote: > > When I created the table, I had to reduce the orc.compress.size quite a > bit to make my table with many columns work. This was on Hive 0.12 (I > thought it was supposed to be fixed on Hive 0.13, but 3k+ columns is huge) > The default of orc.compress size is quite a bit larger ( think in the 268k > range) Try moving that smaller and smaller if that level doesn't work. > Good luck. > > STORED AS orc tblproperties ("orc.compress.size"="8192"); > > > On Thu, May 15, 2014 at 8:11 PM, Premal Shah wrote: > >> I have a table in hive stored as text file with 3283 columns. All columns >> are of string data type. >> >> I'm trying to convert that table into an orc file table using this command >> *create table orc_table stored as orc as select * from text_table;* >> >> This is the setting under mapred-site.xml >> >> >> mapred.child.java.opts >> -Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode >> -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc >> true >> >> >> The tasks die with this error >> >> 2014-05-16 00:53:42,424 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space >> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39) >> at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) >> at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117) >> at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168) >> at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239) >> at org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByteWriter.java:58) >> at org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.java:44) >> at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:553) >> at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012) >> at org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(WriterImpl.java:1455) >> at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400) >> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780) >> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:221) >> at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168) >> at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157) >> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028) >> at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86) >> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) >> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) >> >> >> This is the GC output for a task that ran out of memory >> >> 0.690: [GC 17024K->768K(83008K), 0.0019170 secs] >> 0.842: [GC 8488K(83008K), 0.0066800 secs] >> 1.031: [GC 17792K->1481K(83008K), 0.0015400 secs] >> 1.352: [GC 17142K(83008K), 0.0041840 secs] >> 1.371: [GC 18505K->2249K(83008K), 0.0097240 secs] >> 34.779: [GC 28384K(4177280K), 0.0014050 secs] >> >> >> Anything I can tweak to make it work? >> >> -- >> Regards, >> Premal Shah. >> > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. -- Regards, Premal Shah. --bcaec508f476946cdd04f9c6e98f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanx for the response guys. I tried a few different compr= ession sizes and all of them did not work.
I guess our use-case is not = a good candidate for orc or parquet (which I tried too and it failed)
We will use some other file type.

Thanx again= .


= On Fri, May 16, 2014 at 2:26 PM, Prasanth Jayachandran &l= t;pjayac= handran@hortonworks.com> wrote:
With Hiv= e 0.13 the ORC memory issue is mitigated because of this optimization=C2=A0= https://issues.apache.org/jira/browse/HIVE-6455. This optimization i= s enabled by default.
But having 3283 columns is still huge. So I would still recommend reducing = the default compression (256KB) buffer size to a lower value as suggested b= y John.

Thanks
Prasanth Jayachandran

On May 16, 2014, at 12:31 PM, John Omernik <john@omernik.com> wrote:
When I created the table, = I had to reduce the orc.compress.size quite a bit to make my table with man= y columns work. This was on Hive 0.12 (I thought it was supposed to be fixe= d on Hive 0.13, but 3k+ columns is huge) =C2=A0The default of orc.compress = size is quite a bit larger ( think in the 268k range) Try moving that small= er and smaller if that level doesn't work. =C2=A0Good luck.=C2=A0

STORED AS orc tblproperties ("orc.compress.size"= ;=3D"8192");



On Thu, May 15, 2014 at 8:11 PM, Premal Shah <premal.j.shah@gmail.com> wrote:
I have a table in hive stor= ed as text file with 3283 columns. All columns are of string data type.
I'm trying to convert that table into an orc file table usin= g this command
create table orc_table stored as orc as select * from text_table;

This is the setting under mapred-site.xml
<= div>
<property>
=C2=A0 =C2=A0 <name&g= t;mapred.child.java.opts</name>
=C2=A0 =C2=A0 <value>-Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncr= ementalMode -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc</value>
=
=C2=A0 =C2=A0 <final>true</final>
=C2=A0 </pr= operty>

The tasks die with this error

=
2014-05-16 00:53:42,424 FATAL org.apache.hadoop.mapred.Child: Error ru=
nning child : java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream=
.java:117)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByt=
eWriter.java:58)
	at org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.ja=
va:44)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(Writ=
erImpl.java:553)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStrip=
e(WriterImpl.java:1012)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(=
WriterImpl.java:1455)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStrip=
e(WriterImpl.java:1400)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java=
:1780)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java=
:221)
	at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryMana=
ger.java:168)
	at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.j=
ava:157)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028=
)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(=
OrcOutputFormat.java:86)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOpera=
tor.java:622)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.=
java:87)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOpe=
rator.java:92)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540=
)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1190)

This is the GC output for a task tha=
t ran out of memory
0.690: [GC 17024K->768K(83008K), 0.0019170 secs]
0.842: [GC 8488K(83008K), 0.0066800 secs]
1.031: [GC 17792K->1481K(83008K), 0.0015400 secs]
1.352: [GC 17142K(83008K), 0.0041840 secs]
1.371: [GC 18505K->2249K(83008K), 0.0097240 secs]
34.779: [GC 28384K(4177280K), 0.0014050 secs]

Anything I can tweak to make it work= ?

--
Regards,Premal Shah.



CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u.


--
Regards,
Premal Shah. --bcaec508f476946cdd04f9c6e98f--