Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (nike.apache.org: domain of premal.j.shah@gmail.com
 designates 209.85.160.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <5300D8DD-F3FA-45AB-A274-19B20FA9AE11@hortonworks.com>
References: 
 <CAH72Ak97Po+Xasdsh4yZwr8dX+tysC8Ln6iL1HgWUC5hZiqc0g@mail.gmail.com>
	<CAKOFcwpjECjuD_HTxdFNjweCE+xJ=Q-8KsE1bMCXGE1mgQK7fA@mail.gmail.com>
	<5300D8DD-F3FA-45AB-A274-19B20FA9AE11@hortonworks.com>
Date: Mon, 19 May 2014 13:50:24 -0700
Message-ID: 
 <CAH72Ak8pbqRS+++FYvYAVqCktA6g4MyGHLjXpmdmsM2nBEgY3A@mail.gmail.com>
Subject: Re: ORC file in Hive 0.13 throws Java heap space error
From: Premal Shah <premal.j.shah@gmail.com>
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=bcaec508f476946cdd04f9c6e98f

--bcaec508f476946cdd04f9c6e98f
Content-Type: text/plain; charset=UTF-8

Thanx for the response guys. I tried a few different compression sizes and
all of them did not work.
I guess our use-case is not a good candidate for orc or parquet (which I
tried too and it failed)
We will use some other file type.

Thanx again.


On Fri, May 16, 2014 at 2:26 PM, Prasanth Jayachandran <
pjayachandran@hortonworks.com> wrote:

> With Hive 0.13 the ORC memory issue is mitigated because of this
> optimization https://issues.apache.org/jira/browse/HIVE-6455. This
> optimization is enabled by default.
> But having 3283 columns is still huge. So I would still recommend reducing
> the default compression (256KB) buffer size to a lower value as suggested
> by John.
>
> Thanks
> Prasanth Jayachandran
>
> On May 16, 2014, at 12:31 PM, John Omernik <john@omernik.com> wrote:
>
> When I created the table, I had to reduce the orc.compress.size quite a
> bit to make my table with many columns work. This was on Hive 0.12 (I
> thought it was supposed to be fixed on Hive 0.13, but 3k+ columns is huge)
>  The default of orc.compress size is quite a bit larger ( think in the 268k
> range) Try moving that smaller and smaller if that level doesn't work.
>  Good luck.
>
> STORED AS orc tblproperties ("orc.compress.size"="8192");
>
>
> On Thu, May 15, 2014 at 8:11 PM, Premal Shah <premal.j.shah@gmail.com>wrote:
>
>> I have a table in hive stored as text file with 3283 columns. All columns
>> are of string data type.
>>
>> I'm trying to convert that table into an orc file table using this command
>> *create table orc_table stored as orc as select * from text_table;*
>>
>> This is the setting under mapred-site.xml
>>
>> <property>
>>     <name>mapred.child.java.opts</name>
>>     <value>-Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
>> -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc</value>
>>     <final>true</final>
>>   </property>
>>
>> The tasks die with this error
>>
>> 2014-05-16 00:53:42,424 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
>> 	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
>> 	at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
>> 	at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117)
>> 	at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
>> 	at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
>> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByteWriter.java:58)
>> 	at org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.java:44)
>> 	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:553)
>> 	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012)
>> 	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(WriterImpl.java:1455)
>> 	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400)
>> 	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
>> 	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:221)
>> 	at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
>> 	at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
>> 	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028)
>> 	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86)
>> 	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>> 	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>> 	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>>
>>
>> This is the GC output for a task that ran out of memory
>>
>> 0.690: [GC 17024K->768K(83008K), 0.0019170 secs]
>> 0.842: [GC 8488K(83008K), 0.0066800 secs]
>> 1.031: [GC 17792K->1481K(83008K), 0.0015400 secs]
>> 1.352: [GC 17142K(83008K), 0.0041840 secs]
>> 1.371: [GC 18505K->2249K(83008K), 0.0097240 secs]
>> 34.779: [GC 28384K(4177280K), 0.0014050 secs]
>>
>>
>> Anything I can tweak to make it work?
>>
>> --
>> Regards,
>> Premal Shah.
>>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.


-- 
Regards,
Premal Shah.

--bcaec508f476946cdd04f9c6e98f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanx for the response guys. I tried a few different compr=
ession sizes and all of them did not work.<div>I guess our use-case is not =
a good candidate for orc or parquet (which I tried too and it failed)</div>
<div>We will use some other file type.</div><div><br></div><div>Thanx again=
.</div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">=
On Fri, May 16, 2014 at 2:26 PM, Prasanth Jayachandran <span dir=3D"ltr">&l=
t;<a href=3D"mailto:pjayachandran@hortonworks.com" target=3D"_blank">pjayac=
handran@hortonworks.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word">With Hiv=
e 0.13 the ORC memory issue is mitigated because of this optimization=C2=A0=
<a href=3D"https://issues.apache.org/jira/browse/HIVE-6455" target=3D"_blan=
k">https://issues.apache.org/jira/browse/HIVE-6455</a>. This optimization i=
s enabled by default.<div>
But having 3283 columns is still huge. So I would still recommend reducing =
the default compression (256KB) buffer size to a lower value as suggested b=
y John.</div><div><br><div>
<div>Thanks</div><div>Prasanth Jayachandran</div>

</div><div><div class=3D"h5">
<br><div><div>On May 16, 2014, at 12:31 PM, John Omernik &lt;<a href=3D"mai=
lto:john@omernik.com" target=3D"_blank">john@omernik.com</a>&gt; wrote:</di=
v><br><blockquote type=3D"cite"><div dir=3D"ltr">When I created the table, =
I had to reduce the orc.compress.size quite a bit to make my table with man=
y columns work. This was on Hive 0.12 (I thought it was supposed to be fixe=
d on Hive 0.13, but 3k+ columns is huge) =C2=A0The default of orc.compress =
size is quite a bit larger ( think in the 268k range) Try moving that small=
er and smaller if that level doesn&#39;t work. =C2=A0Good luck.=C2=A0<div>


<br></div><div><p>STORED AS orc tblproperties (&quot;orc.compress.size&quot=
;=3D&quot;8192&quot;);</p></div></div><div class=3D"gmail_extra"><br><br><d=
iv class=3D"gmail_quote">On Thu, May 15, 2014 at 8:11 PM, Premal Shah <span=
 dir=3D"ltr">&lt;<a href=3D"mailto:premal.j.shah@gmail.com" target=3D"_blan=
k">premal.j.shah@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I have a table in hive stor=
ed as text file with 3283 columns. All columns are of string data type.<div=
>

<br>
</div><div>I&#39;m trying to convert that table into an orc file table usin=
g this command</div><div>
<b>create table orc_table stored as orc as select * from text_table;</b><br=
></div><div><br></div><div>This is the setting under mapred-site.xml</div><=
div><br></div><div><div>&lt;property&gt;</div><div>=C2=A0 =C2=A0 &lt;name&g=
t;mapred.child.java.opts&lt;/name&gt;</div>


<div>=C2=A0 =C2=A0 &lt;value&gt;-Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncr=
ementalMode -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc&lt;/value&gt;</div>=
<div>=C2=A0 =C2=A0 &lt;final&gt;true&lt;/final&gt;</div><div>=C2=A0 &lt;/pr=
operty&gt;</div></div>


<div><br></div><div>The tasks die with this error</div><div><br></div><div>=
<pre>2014-05-16 00:53:42,424 FATAL org.apache.hadoop.mapred.Child: Error ru=
nning child : java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.&lt;init&gt;(HeapByteBuffer.java:39)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream=
.java:117)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByt=
eWriter.java:58)
	at org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.ja=
va:44)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(Writ=
erImpl.java:553)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStrip=
e(WriterImpl.java:1012)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(=
WriterImpl.java:1455)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStrip=
e(WriterImpl.java:1400)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java=
:1780)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java=
:221)
	at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryMana=
ger.java:168)
	at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.j=
ava:157)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028=
)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(=
OrcOutputFormat.java:86)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOpera=
tor.java:622)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.=
java:87)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOpe=
rator.java:92)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540=
)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1190)</pre><pre><br></pre><pre>This is the GC output for a task tha=
t ran out of memory</pre>


<pre><font>0.690: [GC 17024K-&gt;768K(83008K), 0.0019170 secs]
0.842: [GC 8488K(83008K), 0.0066800 secs]
1.031: [GC 17792K-&gt;1481K(83008K), 0.0015400 secs]
1.352: [GC 17142K(83008K), 0.0041840 secs]
1.371: [GC 18505K-&gt;2249K(83008K), 0.0097240 secs]
34.779: [GC 28384K(4177280K), 0.0014050 secs]</font><span>
</span></pre><div><br></div></div><div>Anything I can tweak to make it work=
?</div><span><font color=3D"#888888"><div><br></div><div>-- <br>Regards,<br=
>Premal Shah.
</div></font></span></div>
</blockquote></div><br></div>
</blockquote></div><br></div></div></div></div>
<br>
<span style=3D"color:rgb(128,128,128);font-family:Arial,sans-serif;font-siz=
e:10px">CONFIDENTIALITY NOTICE</span><br style=3D"color:rgb(128,128,128);fo=
nt-family:Arial,sans-serif;font-size:10px"><span style=3D"color:rgb(128,128=
,128);font-family:Arial,sans-serif;font-size:10px">NOTICE: This message is =
intended for the use of the individual or entity to which it is addressed a=
nd may contain information that is confidential, privileged and exempt from=
 disclosure under applicable law. If the reader of this message is not the =
intended recipient, you are hereby notified that any printing, copying, dis=
semination, distribution, disclosure or forwarding of this communication is=
 strictly prohibited. If you have received this communication in error, ple=
ase contact the sender immediately and delete it from your system. Thank Yo=
u.</span></blockquote>
</div><br><br clear=3D"all"><div><br></div>-- <br>Regards,<br>Premal Shah.
</div>

--bcaec508f476946cdd04f9c6e98f--