drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hao Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3118) "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column
Date Sat, 16 May 2015 16:44:59 GMT
Hao Zhu created DRILL-3118:
------------------------------

             Summary: "java.lang.IndexOutOfBoundsException" if the source data has a "dir0"
column
                 Key: DRILL-3118
                 URL: https://issues.apache.org/jira/browse/DRILL-3118
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.0.0
            Reporter: Hao Zhu
            Assignee: Chris Westin


Tested on 1.0 with commit id:
{code}
select commit_id from sys.version;
+-------------------------------------------+
|                 commit_id                 |
+-------------------------------------------+
| d8b19759657698581cc0d01d7038797952888123  |
+-------------------------------------------+
1 row selected (0.097 seconds)
{code}

When source data has column name like "dir0","dir1"...., the query may fail with "java.lang.IndexOutOfBoundsException".

For example:
{code}
> select `dir999` from dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`;
Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0,
0))

Fragment 0:0

[Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010]

  (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet record reader.
Message:
Hadoop path: /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet
Total records read: 0
Mock records read: 0
Records to read: 32768
Row group index: 0
Records in row group: 1
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
  optional int32 id;
  optional binary dir999;
}
, metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32  [PLAIN, RLE,
PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  [PLAIN, RLE, PLAIN_DICTIONARY],
103}]}]}
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.physical.impl.BaseRootExec.next():83
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
    org.apache.drill.exec.physical.impl.BaseRootExec.next():73
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
    java.security.AccessController.doPrivileged():-2
  optional int32 id;
  optional binary dir999;
}
, metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32  [PLAIN, RLE,
PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  [PLAIN, RLE, PLAIN_DICTIONARY],
103}]}]}
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.physical.impl.BaseRootExec.next():83
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
    org.apache.drill.exec.physical.impl.BaseRootExec.next():73
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1469
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745
  Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0,
0))
    io.netty.buffer.DrillBuf.checkIndexD():189
    io.netty.buffer.DrillBuf.chk():211
    io.netty.buffer.DrillBuf.getInt():491
    org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321
    org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481
    org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408
    org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513
    org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():425
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.physical.impl.BaseRootExec.next():83
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
    org.apache.drill.exec.physical.impl.BaseRootExec.next():73
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1469
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)
{code}

My thought:
We need to fix this by 
1. Either prompting a readable message saying "dirN" is a reserved column names, please change
drill.exec.storage.file.partition.column.label to something else;
2. Or/And if source data has dirN columns, it should override our reserved "dirN".
3. We need to document "drill.exec.storage.file.partition.column.label" in http://drill.apache.org/docs/querying-directories/
4. drill.exec.storage.file.partition.column.label is a system level configuration, if we use
it as a workaround, it will impact the whole system. Can we make it a session level?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message