Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
MIME-Version: 1.0
In-Reply-To: <2F20E604-5407-4C42-8935-E0C5EE21A785@hortonworks.com>
References: 
 <CANsgGk3UN6C=5Z4=tjU=QuLF0huDYWK=RMdRDM+1nkbTG-mYtw@mail.gmail.com>
	<10107504-0098-4284-93C6-DB6CB1B8B0CC@hortonworks.com>
	<C47441C0-68D4-4D60-B0A5-94B0DA0F338F@hortonworks.com>
	<CANsgGk3Xz+U6L_nzfKjbxSs6TrLDZ7eekzP-Sj-yisuXjEf3+Q@mail.gmail.com>
	<2F20E604-5407-4C42-8935-E0C5EE21A785@hortonworks.com>
Date: Tue, 1 Mar 2016 14:41:57 +0000
Message-ID: 
 <CANsgGk2i_duk_0YR+YJYQj32==z5gG82Tsq4cfMT2zaY3xh0Kg@mail.gmail.com>
Subject: Re: ORC file split calculation problems
From: Patrick Duin <patduin@gmail.com>
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=001a11420fba72ee33052cfdc4fe

--001a11420fba72ee33052cfdc4fe
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Prasanth,

Thanks for this. I tried out the configuration and I wanted to share some
number with you.

My test setup is a cascading job that reads in 240 files (ranging from
1.5GB to 2.5GB).
In the job log I get the duration from these lines:
INFO log.PerfLogger: </PERFLOG method=3DOrcGetSplits start=3D1456747523670
end=3D1456747640171 duration=3D116501
from=3Dorg.apache.hadoop.hive.ql.io.orc.ReaderImpl>

Running this without any of the configuration takes:116501 ms
Setting both flags as per your email: 27233 ms
A nice improvement.
But doing the same test on data where the files have file size smaller than
256MB (The orc block size).
The orcGetSplits takes: 2741 ms
With or without setting the configuration, result are the same.

This is still a fairly big gap. Knowing we can tune the performance with
your suggested configuration is great as we might not always have the
option to repartition our data. Still avoiding spanning files over multiple
blocks seems to have much more of an impact even though it is
counter-intuitive.
Would be good to know if other users have similar experiences.

Again thanks for your help.

Kind regards,
 Patrick.


2016-02-29 6:38 GMT+00:00 Prasanth Jayachandran <
pjayachandran@hortonworks.com>:

> Hi Patrick
>
> Please find answers inline
>
> On Feb 26, 2016, at 9:36 AM, Patrick Duin <patduin@gmail.com> wrote:
>
> Hi Prasanth.
>
> Thanks for the quick reply!
>
> The logs don't show much more of the stacktrace I'm afraid:
> java.lang.NullPointerException
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInp=
utFormat.java:809)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java=
:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav=
a:615)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> The stacktrace isn't really the issue though. The NullPointer is a sympto=
m
> caused by not being able to return any stripes, if you look at the line i=
n
> the code it is  because the 'stripes' field is null which should never
> happen. This, we think, is caused by failing namenode network traffic. We
> would have lots of IO warning in the logs saying block's cannot be found =
or
> e.g.:
> 16/02/01 13:20:34 WARN hdfs.BlockReaderFactory: I/O error constructing
> remote block reader.
> java.io.IOException: java.lang.InterruptedException
>         at org.apache.hadoop.ipc.Client.call(Client.java:1448)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.=
java:232)
>         at com.sun.proxy.$Proxy32.getServerDefaults(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getS=
erverDefaults(ClientNamenodeProtocolTranslatorPB.java:268)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java=
:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI=
mpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoc=
ationHandler.java:187)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationH=
andler.java:102)
>         at com.sun.proxy.$Proxy33.getServerDefaults(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1007)
>         at
> org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:2062)
>         at
> org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:2068=
)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.=
checkTrustAndSend(SaslDataTransferClient.java:208)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.=
peerSend(SaslDataTransferClient.java:159)
>         at
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServ=
er.java:90)
>         at
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3123)
>         at
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.=
java:755)
>         at
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(Blo=
ckReaderFactory.java:670)
>         at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:3=
37)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576=
)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.jav=
a:800)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
>         at java.io.DataInputStream.readFully(DataInputStream.java:195)
>         at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(Rea=
derImpl.java:407)
>         at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAn=
dCacheStripeDetails(OrcInputFormat.java:885)
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInp=
utFormat.java:771)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java=
:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav=
a:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException
>         at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:187)
>         at
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1047)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1442)
>         ... 33 more
>
> Our job doesn't always fail sometimes splits get calculated. We suspect
> when the namenode is too busy our job maybe hits some time-outs and the
> whole thing fails.
>
> Our intuition has been the same as you suggest, bigger files is better.
> But we see a degradation in performance as soon as our files get bigger
> than the ORC block size. Keeping file size within ORC block size sounds
> silly but when looking at the code (OrcInputFormat) we think  it cuts out=
 a
> bunch of code that is causing us problems. The code we are trying to hit
> is:
> https://github.com/apache/hive/blob/release-0.14.0/ql/src/java/org/apache=
/hadoop/hive/ql/io/orc/OrcInputFormat.java#L656.
>
>
>
> This line is hit only when the file does not span multiple blocks and is
> less than a max split size (by default same as block size). If you want t=
o
> avoid reading the footers for split elimination or if you are not using
> SARGs then I would recommend the following configurations
>
> // disables file footer cache. When this cache is disabled file footers
> are not read
> set hive.orc.cache.stripe.details.size=3D-1;
>
> // disables predicate pushdown (when not using SARG no need for this)
> set hive.optimize.index.filter=3Dfalse;
>
>
> Avoiding the scheduling.
>
> In our case we are not using any SARG but we do use column projection.
>
> Any idea why if we query the data via Hive we don't have this issue?
>
> Let me know if you need more information. Thanks for the insights, much
> appreciated.
>
> Kind regards,
>  Patrick
>
>
> 2016-02-25 22:20 GMT+01:00 Prasanth Jayachandran <
> pjayachandran@hortonworks.com>:
>
>>
>> > On Feb 25, 2016, at 3:15 PM, Prasanth Jayachandran <
>> pjayachandran@hortonworks.com> wrote:
>> >
>> > Hi Patrick
>> >
>> > Can you paste entire stacktrace? Looks like NPE happened during split
>> generation but stack trace is incomplete to know what caused it.
>> >
>> > In Hive 0.14.0, the stripe size is changed to 64MB. The default block
>> size for ORC files is 256MB. 4 stripes can fit a block. ORC does padding=
 to
>> avoid stripes straddling HDFS blocks. During split calculation, ORC foot=
er
>> which contains stripe level column statistics is read to perform split
>> pruning based on predicate condition specified via SARG(Search Argument)=
.
>> >
>> > For example: Assume column =E2=80=98state=E2=80=99 is sorted and the p=
redicate
>> condition is =E2=80=98state=E2=80=99=3D=E2=80=9CCA"
>> > Stripe 1: min =3D AZ max =3D FL
>> > Stripe 2: min =3D GA max =3D MN
>> > Stripe 3: min =3D MS max =3D SC
>> > Stripe 4: min =3D SD max =3D WY
>> >
>> > In this case, only stripe 1 satisfies the above predicate condition. S=
o
>> only 1 split with stripe 1 will be created.
>> > So if there are huge number of small files, then footers from all file=
s
>> has to be read to do split pruning. If there are few number of large fil=
es
>> then only few footers have to be read. Also the minimum splittable posit=
ion
>> is stripe boundary. So having fewer large files has the advantage of
>> reading less data during split pruning.
>> >
>> > If you can send me the full stacktrace, I can tell what is causing the
>> exception here. I will also let you know of any workaround/next hive
>> version with the fix.
>> >
>> > In more recent hive versions, hive 1.2.0 onwards. OrcInputFormat is ha=
s
>> strategies to decided when to read footers and when not to read footers
>> automatically. You can configure the strategy that you want based on the
>> workload. In case of many small files, footers will not be read and with
>> large files footers will be read for split pruning.
>>
>> The default strategy does it automatically (choosing between when to rea=
d
>> and when not to footers). It is configurable as well.
>>
>> >
>> > Thanks
>> > Prasanth
>> >
>> >> On Feb 25, 2016, at 7:08 AM, Patrick Duin <patduin@gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> We've recently moved one of our datasets to ORC and we use Cascading
>> and Hive to read this data. We've had problems reading the data via
>> Cascading, because of the generation of splits.
>> >> We read in a large number of files (thousands) and they are about 1GB
>> each. We found that the split calculation took minutes on our cluster an=
d
>> often didn't succeed at all (when our namenode was busy).
>> >> When digging through the code of the
>> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.class' we figured out t=
hat
>> if we make the files less then the ORC block size (256MB) the code would
>> avoid lots of namenode calls. We applied this solution and made our file=
s
>> smaller and that solved the problem. Split calculation in our job went f=
rom
>> 10+ mins to a couple of seconds and always succeeds.
>> >> We feel it is counterintuitive as bigger files are usually better in
>> HDFS. We've also seen that doing a hive query on the data does not prese=
nt
>> this problem. Internally Hive seem to take a completely different execut=
ion
>> path and is not using the OrcInputFormat but uses
>> 'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.class'.
>> >>
>> >> Can someone explain the reason for this difference or shed some light
>> on the behaviour we are seeing? Any help will be greatly appreciated. We
>> are using hive-0.14.0.
>> >>
>> >> Kind regards,
>> >> Patrick
>> >>
>> >> Here is the stack-trace that we would see when our Cascading job
>> failed to calculate the splits:
>> >> Caused by: java.lang.RuntimeException: serious problem
>> >>        at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(Orc=
InputFormat.java:478)
>> >>        at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcIn=
putFormat.java:949)
>> >>        at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat=
.java:974)
>> >>        at
>> com.hotels.corc.mapred.CorcInputFormat.getSplits(CorcInputFormat.java:20=
1)
>> >>        at
>> cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java=
:200)
>> >>        at
>> cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java=
:142)
>> >>        at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.jav=
a:624)
>> >>        at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:6=
16)
>> >>        at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.=
java:492)
>> >>        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>> >>        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>> >>        at java.security.AccessController.doPrivileged(Native Method)
>> >>        at javax.security.auth.Subject.doAs(Subject.java:415)
>> >>        at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio=
n.java:1628)
>> >>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>> >>        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:585=
)
>> >>        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:580=
)
>> >>        at java.security.AccessController.doPrivileged(Native Method)
>> >>        at javax.security.auth.Subject.doAs(Subject.java:415)
>> >>        at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio=
n.java:1628)
>> >>        at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:580)
>> >>        at
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:571)
>> >>        at
>> cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart=
(HadoopFlowStepJob.java:106)
>> >>        at
>> cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:265)
>> >>        at
>> cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:184)
>> >>        at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:14=
6)
>> >>        at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:48=
)
>> >>        ... 4 more
>> >> Caused by: java.lang.NullPointerException
>> >>        at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcIn=
putFormat.java:809)
>> >
>>
>>
>
>

--001a11420fba72ee33052cfdc4fe
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div>Hi Prasanth,<br><br></div>Thanks =
for this. I tried out the configuration and I wanted to share some number w=
ith you. <br><br></div>My test setup is a cascading job that reads in 240 f=
iles (ranging from 1.5GB to 2.5GB).<br></div>In the job log I get the durat=
ion from these lines:<br><span style=3D"color:rgb(51,51,51);font-family:mon=
ospace;font-size:14px;font-style:normal;font-variant:normal;font-weight:nor=
mal;letter-spacing:normal;line-height:20px;text-align:start;text-indent:0px=
;text-transform:none;white-space:normal;word-spacing:0px;display:inline!imp=
ortant;float:none;background-color:rgb(245,245,245)"><font size=3D"2">INFO =
log.PerfLogger: &lt;/PERFLOG method=3DOrcGetSplits start=3D1456747523670 en=
d=3D1456747640171 duration=3D116501 from=3Dorg.apache.hadoop.hive.ql.io.orc=
.ReaderImpl&gt;<br></font><br></span></div>Running this without any of the =
configuration takes:116501 ms<br>Setting both flags as per your email: 2723=
3 ms<span style=3D"color:rgb(255,0,0);font-family:Arial,sans-serif;font-siz=
e:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spac=
ing:normal;line-height:20px;text-align:start;text-indent:0px;text-transform=
:none;white-space:normal;word-spacing:0px;display:inline!important;float:no=
ne;background-color:rgb(245,245,245)"><br></span></div>A nice improvement. =
<br><span style=3D"color:rgb(51,51,51);font-family:monospace;font-size:14px=
;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:no=
rmal;line-height:20px;text-align:start;text-indent:0px;text-transform:none;=
white-space:normal;word-spacing:0px;display:inline!important;float:none;bac=
kground-color:rgb(245,245,245)"></span><div><div>But doing the same test on=
 data where the files have file size smaller than 256MB (The orc block size=
).<br></div><div>The orcGetSplits takes: 2741 ms<br></div><div>With or with=
out setting the configuration, result are the same.<br><br></div><div>This =
is still a fairly big gap. Knowing we can tune the performance with your su=
ggested configuration is great as we might not always have the option to re=
partition our data. Still avoiding spanning files over multiple blocks seem=
s to have much more of an impact even though it is counter-intuitive. <br><=
/div><div>Would be good to know if other users have similar experiences.<br=
><br></div><div>Again thanks for your help.<br><br></div><div>Kind regards,=
<br></div><div>=C2=A0Patrick.<br></div><div><br></div><div><br><br><div><di=
v><div><div><div><div class=3D"gmail_extra"><div class=3D"gmail_quote">2016=
-02-29 6:38 GMT+00:00 Prasanth Jayachandran <span dir=3D"ltr">&lt;<a href=
=3D"mailto:pjayachandran@hortonworks.com" target=3D"_blank">pjayachandran@h=
ortonworks.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex">


<div style=3D"word-wrap:break-word">
Hi Patrick
<div><br>
</div>
<div>Please find answers inline</div>
<div><br>
<div><div><div>
<blockquote type=3D"cite">
<div>On Feb 26, 2016, at 9:36 AM, Patrick Duin &lt;<a href=3D"mailto:patdui=
n@gmail.com" target=3D"_blank">patduin@gmail.com</a>&gt; wrote:</div>
<br>
<div>
<div dir=3D"ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hi Prasanth.<br>
<br>
</div>
Thanks for the quick reply!<br>
<br>
</div>
The logs don&#39;t show much more of the stacktrace I&#39;m afraid:<br>
java.lang.NullPointerException<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hive.ql.io.=
orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:809)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.ThreadPo=
olExecutor.runWorker(ThreadPoolExecutor.java:1145)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.ThreadPo=
olExecutor$Worker.run(ThreadPoolExecutor.java:615)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thread.j=
ava:745)<br>
</div>
<br>
</div>
<br>
</div>
The stacktrace isn&#39;t really the issue though. The NullPointer is a symp=
tom caused by not being able to return any stripes, if you look at the line=
 in the code it is=C2=A0 because the &#39;stripes&#39; field is null which =
should never happen. This, we think, is caused by
 failing namenode network traffic. We would have lots of IO warning in the =
logs saying block&#39;s cannot be found or e.g.:<br>
16/02/01 13:20:34 WARN hdfs.BlockReaderFactory: I/O error constructing remo=
te block reader.<br>
java.io.IOException: java.lang.InterruptedException<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.ipc.Client.=
call(Client.java:1448)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.ipc.Client.=
call(Client.java:1400)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.ipc.Protobu=
fRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at com.sun.proxy.$Proxy32.getSer=
verDefaults(Unknown Source)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.protoc=
olPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodePro=
tocolTranslatorPB.java:268)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at sun.reflect.NativeMethodAcces=
sorImpl.invoke0(Native Method)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at sun.reflect.NativeMethodAcces=
sorImpl.invoke(NativeMethodAccessorImpl.java:57)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at sun.reflect.DelegatingMethodA=
ccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.lang.reflect.Method.invo=
ke(Method.java:606)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.io.retry.Re=
tryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.io.retry.Re=
tryInvocationHandler.invoke(RetryInvocationHandler.java:102)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at com.sun.proxy.$Proxy33.getSer=
verDefaults(Unknown Source)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.DFSCli=
ent.getServerDefaults(DFSClient.java:1007)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.DFSCli=
ent.shouldEncryptData(DFSClient.java:2062)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.DFSCli=
ent.newDataEncryptionKey(DFSClient.java:2068)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.protoc=
ol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTrans=
ferClient.java:208)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.protoc=
ol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient=
.java:159)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.net.Tc=
pPeerServer.peerFromSocketAndKey(TcpPeerServer.java:90)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.DFSCli=
ent.newConnectedPeer(DFSClient.java:3123)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.BlockR=
eaderFactory.nextTcpPeer(BlockReaderFactory.java:755)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.BlockR=
eaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.BlockR=
eaderFactory.build(BlockReaderFactory.java:337)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.DFSInp=
utStream.blockSeekTo(DFSInputStream.java:576)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.DFSInp=
utStream.readWithStrategy(DFSInputStream.java:800)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.DFSInp=
utStream.read(DFSInputStream.java:848)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.io.DataInputStream.readF=
ully(DataInputStream.java:195)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hive.ql.io.=
orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:407)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hive.ql.io.=
orc.ReaderImpl.&lt;init&gt;(ReaderImpl.java:311)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hive.ql.io.=
orc.OrcFile.createReader(OrcFile.java:228)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hive.ql.io.=
orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFor=
mat.java:885)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hive.ql.io.=
orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:771)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.ThreadPo=
olExecutor.runWorker(ThreadPoolExecutor.java:1145)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.ThreadPo=
olExecutor$Worker.run(ThreadPoolExecutor.java:615)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thread.j=
ava:745)<br>
Caused by: java.lang.InterruptedException<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.FutureTa=
sk.awaitDone(FutureTask.java:400)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.FutureTa=
sk.get(FutureTask.java:187)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.ipc.Client$=
Connection.sendRpcRequest(Client.java:1047)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.ipc.Client.=
call(Client.java:1442)<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ... 33 more<br>
<br>
Our job doesn&#39;t always fail sometimes splits get calculated. We suspect=
 when the namenode is too busy our job maybe hits some time-outs and the wh=
ole thing fails.
<br>
<br>
</div>
Our intuition has been the same as you suggest, bigger files is better. But=
 we see a degradation in performance as soon as our files get bigger than t=
he ORC block size. Keeping file size within ORC block size sounds silly but=
 when looking at the code (OrcInputFormat)
 we think=C2=A0 it cuts out a bunch of code that is causing us problems. Th=
e code we are trying to hit is:
<a href=3D"https://github.com/apache/hive/blob/release-0.14.0/ql/src/java/o=
rg/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L656" target=3D"_blank"=
>
https://github.com/apache/hive/blob/release-0.14.0/ql/src/java/org/apache/h=
adoop/hive/ql/io/orc/OrcInputFormat.java#L656</a>.
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
</div></div><div>This line is hit only when the file does not span multiple=
 blocks and is less than a max split size (by default same as block size). =
If you want to avoid reading the footers for split elimination or if you ar=
e not using SARGs then I would recommend the
 following configurations</div>
<span>
<div><span><br>
</span></div>
</span>// disables file footer cache. When this cache is disabled file foot=
ers are not read=C2=A0<span><br>
</span><span>set hive.orc.cache.stripe.details.size=3D-1;=C2=A0</span></div=
>
<div><span><br>
</span></div>
<div><span>// disables predicate pushdown (when not using SARG no need for =
this)</span></div>
<div><span>set hive.optimize.index.filter=3Dfalse;=C2=A0<br>
<br>
</span><div><div><span><br>
</span>
<blockquote type=3D"cite">
<div>
<div dir=3D"ltr">
<div>Avoiding the scheduling.<br>
</div>
<div><br>
In our case we are not using any SARG but we do use column projection.<br>
<br>
</div>
<div>Any idea why if we query the data via Hive we don&#39;t have this issu=
e?<br>
</div>
<div><br>
</div>
<div>Let me know if you need more information. Thanks for the insights, muc=
h appreciated.<br>
<br>
</div>
<div>Kind regards,<br>
</div>
<div>=C2=A0Patrick<br>
</div>
<br>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">2016-02-25 22:20 GMT+01:00 Prasanth Jayachandran=
 <span dir=3D"ltr">
&lt;<a href=3D"mailto:pjayachandran@hortonworks.com" target=3D"_blank">pjay=
achandran@hortonworks.com</a>&gt;</span>:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
<span><br>
&gt; On Feb 25, 2016, at 3:15 PM, Prasanth Jayachandran &lt;<a href=3D"mail=
to:pjayachandran@hortonworks.com" target=3D"_blank">pjayachandran@hortonwor=
ks.com</a>&gt; wrote:<br>
&gt;<br>
&gt; Hi Patrick<br>
&gt;<br>
&gt; Can you paste entire stacktrace? Looks like NPE happened during split =
generation but stack trace is incomplete to know what caused it.<br>
&gt;<br>
&gt; In Hive 0.14.0, the stripe size is changed to 64MB. The default block =
size for ORC files is 256MB. 4 stripes can fit a block. ORC does padding to=
 avoid stripes straddling HDFS blocks. During split calculation, ORC footer=
 which contains stripe level column
 statistics is read to perform split pruning based on predicate condition s=
pecified via SARG(Search Argument).<br>
&gt;<br>
&gt; For example: Assume column =E2=80=98state=E2=80=99 is sorted and the p=
redicate condition is =E2=80=98state=E2=80=99=3D=E2=80=9CCA&quot;<br>
&gt; Stripe 1: min =3D AZ max =3D FL<br>
&gt; Stripe 2: min =3D GA max =3D MN<br>
&gt; Stripe 3: min =3D MS max =3D SC<br>
&gt; Stripe 4: min =3D SD max =3D WY<br>
&gt;<br>
&gt; In this case, only stripe 1 satisfies the above predicate condition. S=
o only 1 split with stripe 1 will be created.<br>
&gt; So if there are huge number of small files, then footers from all file=
s has to be read to do split pruning. If there are few number of large file=
s then only few footers have to be read. Also the minimum splittable positi=
on is stripe boundary. So having fewer
 large files has the advantage of reading less data during split pruning.<b=
r>
&gt;<br>
&gt; If you can send me the full stacktrace, I can tell what is causing the=
 exception here. I will also let you know of any workaround/next hive versi=
on with the fix.<br>
&gt;<br>
&gt; In more recent hive versions, hive 1.2.0 onwards. OrcInputFormat is ha=
s strategies to decided when to read footers and when not to read footers a=
utomatically. You can configure the strategy that you want based on the wor=
kload. In case of many small files,
 footers will not be read and with large files footers will be read for spl=
it pruning.<br>
<br>
</span>The default strategy does it automatically (choosing between when to=
 read and when not to footers). It is configurable as well.<br>
<div>
<div><br>
&gt;<br>
&gt; Thanks<br>
&gt; Prasanth<br>
&gt;<br>
&gt;&gt; On Feb 25, 2016, at 7:08 AM, Patrick Duin &lt;<a href=3D"mailto:pa=
tduin@gmail.com" target=3D"_blank">patduin@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Hi,<br>
&gt;&gt;<br>
&gt;&gt; We&#39;ve recently moved one of our datasets to ORC and we use Cas=
cading and Hive to read this data. We&#39;ve had problems reading the data =
via Cascading, because of the generation of splits.<br>
&gt;&gt; We read in a large number of files (thousands) and they are about =
1GB each. We found that the split calculation took minutes on our cluster a=
nd often didn&#39;t succeed at all (when our namenode was busy).<br>
&gt;&gt; When digging through the code of the &#39;org.apache.hadoop.hive.q=
l.io.orc.OrcInputFormat.class&#39; we figured out that if we make the files=
 less then the ORC block size (256MB) the code would avoid lots of namenode=
 calls. We applied this solution and made our
 files smaller and that solved the problem. Split calculation in our job we=
nt from 10+ mins to a couple of seconds and always succeeds.<br>
&gt;&gt; We feel it is counterintuitive as bigger files are usually better =
in HDFS. We&#39;ve also seen that doing a hive query on the data does not p=
resent this problem. Internally Hive seem to take a completely different ex=
ecution path and is not using the OrcInputFormat
 but uses &#39;org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.class=
9;.<br>
&gt;&gt;<br>
&gt;&gt; Can someone explain the reason for this difference or shed some li=
ght on the behaviour we are seeing? Any help will be greatly appreciated. W=
e are using hive-0.14.0.<br>
&gt;&gt;<br>
&gt;&gt; Kind regards,<br>
&gt;&gt; Patrick<br>
&gt;&gt;<br>
&gt;&gt; Here is the stack-trace that we would see when our Cascading job f=
ailed to calculate the splits:<br>
&gt;&gt; Caused by: java.lang.RuntimeException: serious problem<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hive.ql.io.orc.Orc=
InputFormat$Context.waitForTasks(OrcInputFormat.java:478)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hive.ql.io.orc.Orc=
InputFormat.generateSplitsInfo(OrcInputFormat.java:949)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hive.ql.io.orc.Orc=
InputFormat.getSplits(OrcInputFormat.java:974)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.hotels.corc.mapred.CorcInputForm=
at.getSplits(CorcInputFormat.java:201)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at cascading.tap.hadoop.io.MultiInputFo=
rmat.getSplits(MultiInputFormat.java:200)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at cascading.tap.hadoop.io.MultiInputFo=
rmat.getSplits(MultiInputFormat.java:142)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.JobSubmi=
tter.writeOldSplits(JobSubmitter.java:624)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.JobSubmi=
tter.writeSplits(JobSubmitter.java:616)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.JobSubmi=
tter.submitJobInternal(JobSubmitter.java:492)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.Job$10.r=
un(Job.java:1296)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.Job$10.r=
un(Job.java:1293)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.security.AccessController.doPri=
vileged(Native Method)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at javax.security.auth.Subject.doAs(Sub=
ject.java:415)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.security.UserGroup=
Information.doAs(UserGroupInformation.java:1628)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.Job.subm=
it(Job.java:1293)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.JobClient$1=
.run(JobClient.java:585)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.JobClient$1=
.run(JobClient.java:580)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.security.AccessController.doPri=
vileged(Native Method)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at javax.security.auth.Subject.doAs(Sub=
ject.java:415)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.security.UserGroup=
Information.doAs(UserGroupInformation.java:1628)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.JobClient.s=
ubmitJobInternal(JobClient.java:580)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.JobClient.s=
ubmitJob(JobClient.java:571)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at cascading.flow.hadoop.planner.Hadoop=
FlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:106)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at cascading.flow.planner.FlowStepJob.b=
lockOnJob(FlowStepJob.java:265)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at cascading.flow.planner.FlowStepJob.s=
tart(FlowStepJob.java:184)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at cascading.flow.planner.FlowStepJob.c=
all(FlowStepJob.java:146)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at cascading.flow.planner.FlowStepJob.c=
all(FlowStepJob.java:48)<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 ... 4 more<br>
&gt;&gt; Caused by: java.lang.NullPointerException<br>
&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hive.ql.io.orc.Orc=
InputFormat$SplitGenerator.run(OrcInputFormat.java:809)<br>
&gt;<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div></div></div>
<br>
</div>
</div>

</blockquote></div><br></div></div></div></div></div></div></div></div></di=
v>

--001a11420fba72ee33052cfdc4fe--