hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Capwell <dcapw...@gmail.com>
Subject Re: ORC NPE while writing stats
Date Wed, 02 Sep 2015 20:06:52 GMT
Also, the data put in are primitives, structs (list), and arrays (list); we
don't use any of the boxed writables (like text).
On Sep 2, 2015 12:57 PM, "David Capwell" <dcapwell@gmail.com> wrote:

> We have multiple threads writing, but each thread works on one file, so
> orc writer is only touched by one thread (never cross threads)
> On Sep 2, 2015 11:18 AM, "Owen O'Malley" <omalley@apache.org> wrote:
>
>> I don't see how it would get there. That implies that minimum was null,
>> but the count was non-zero.
>>
>> The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like:
>>
>> @Override
>> OrcProto.ColumnStatistics.Builder serialize() {
>>   OrcProto.ColumnStatistics.Builder result = super.serialize();
>>   OrcProto.StringStatistics.Builder str =
>>     OrcProto.StringStatistics.newBuilder();
>>   if (getNumberOfValues() != 0) {
>>     str.setMinimum(getMinimum());
>>     str.setMaximum(getMaximum());
>>     str.setSum(sum);
>>   }
>>   result.setStringStatistics(str);
>>   return result;
>> }
>>
>> and thus shouldn't call down to setMinimum unless it had at least some non-null values
in the column.
>>
>> Do you have multiple threads working? There isn't anything that should be introducing
non-determinism so for the same input it would fail at the same point.
>>
>> .. Owen
>>
>>
>>
>>
>> On Tue, Sep 1, 2015 at 10:51 PM, David Capwell <dcapwell@gmail.com>
>> wrote:
>>
>>> We are writing ORC files in our application for hive to consume.
>>> Given enough time, we have noticed that writing causes a NPE when
>>> working with a string column's stats.  Not sure whats causing it on
>>> our side yet since replaying the same data is just fine, it seems more
>>> like this just happens over time (different data sources will hit this
>>> around the same time in the same JVM).
>>>
>>> Here is the code in question, and below is the exception:
>>>
>>> final Writer writer = OrcFile.createWriter(path,
>>> OrcFile.writerOptions(conf).inspector(oi));
>>> try {
>>> for (Data row : rows) {
>>>    List<Object> struct = Orc.struct(row, inspector);
>>>    writer.addRow(struct);
>>> }
>>> } finally {
>>>    writer.close();
>>> }
>>>
>>>
>>> Here is the exception:
>>>
>>> java.lang.NullPointerException: null
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$Builder.setMinimum(OrcProto.java:1803)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl.serialize(ColumnStatisticsImpl.java:411)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.createRowIndexEntry(WriterImpl.java:1255)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.createRowIndexEntry(WriterImpl.java:1978)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1985)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:322)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
>>> ~[hive-exec-0.14.0.jar:0.14.0]
>>>         at
>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2276)
>>> ~[hive-exec-0.14.0.jar:
>>>
>>>
>>> Versions:
>>>
>>> Hadoop: apache 2.2.0
>>> Hive Apache: 0.14.0
>>> Java 1.7
>>>
>>>
>>> Thanks for your time reading this email.
>>>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message