drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arina Ielchiieva (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-6204) Pass tables columns without partition columns to empty Hive reader
Date Fri, 02 Mar 2018 14:05:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arina Ielchiieva updated DRILL-6204:
------------------------------------
    Description: 
When {{store.hive.optimize_scan_with_native_readers}} is enabled, {{HiveDrillNativeScanBatchCreator}}
is used to read data from Hive tables directly from file system. In case when table is empty
or no row group are matched, empty {{HiveDefaultReader}} is called to output the schema.

If such situation happens, currently Drill fails with the following error:
{noformat}
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NullPointerException
Setup failed for null 
{noformat}
This happens because instead of passing only table columns to the empty reader (as we do when
creating non-empty reader), we passed all columns which may contain partition columns as well.
Readers fails to find partition column in table schema. As mentioned in on lines 81 - 82 in
{{HiveDrillNativeScanBatchCreator}} , we deliberately separate out partition columns and table
columns to pass partition columns separately:
{noformat}
      // Separate out the partition and non-partition columns. Non-partition columns are passed
directly to the
      // ParquetRecordReader. Partition columns are passed to ScanBatch.
{noformat}
To fix the problem we need to pass table columns instead of all columns.
{code:java}
    if (readers.size() == 0) {
      readers.add(new HiveDefaultReader(table, null, null, newColumns, context, conf,
        ImpersonationUtil.createProxyUgi(config.getUserName(), context.getQueryUserName())));
    }
{code}

  was:
When {{store.hive.optimize_scan_with_native_readers}} is enabled, {{HiveDrillNativeScanBatchCreator}}
is used to read data from Hive tables directly from file system. In case when table is empty
or no row group are matched, empty {{HiveDefaultReader}} is called to output the schema.

If such situation happens, currently Drill fails with the following error:
{noformat}
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NullPointerException
Setup failed for null 
{noformat}
This happens because instead of passing only table columns to the empty reader (as we do when
creating non-empty reader), we passed all columns which may contain partition columns as well.
As mentioned in on lines 81 - 82 in {{HiveDrillNativeScanBatchCreator}} , we deliberately
separate out partition columns and table columns to pass partition columns separately:
{noformat}
      // Separate out the partition and non-partition columns. Non-partition columns are passed
directly to the
      // ParquetRecordReader. Partition columns are passed to ScanBatch.
{noformat}
To fix the problem we need to pass table columns instead of all columns.
{code:java}
    if (readers.size() == 0) {
      readers.add(new HiveDefaultReader(table, null, null, newColumns, context, conf,
        ImpersonationUtil.createProxyUgi(config.getUserName(), context.getQueryUserName())));
    }
{code}


> Pass tables columns without partition columns to empty Hive reader
> ------------------------------------------------------------------
>
>                 Key: DRILL-6204
>                 URL: https://issues.apache.org/jira/browse/DRILL-6204
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Hive
>    Affects Versions: 1.12.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.13.0
>
>
> When {{store.hive.optimize_scan_with_native_readers}} is enabled, {{HiveDrillNativeScanBatchCreator}}
is used to read data from Hive tables directly from file system. In case when table is empty
or no row group are matched, empty {{HiveDefaultReader}} is called to output the schema.
> If such situation happens, currently Drill fails with the following error:
> {noformat}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NullPointerException
Setup failed for null 
> {noformat}
> This happens because instead of passing only table columns to the empty reader (as we
do when creating non-empty reader), we passed all columns which may contain partition columns
as well. Readers fails to find partition column in table schema. As mentioned in on lines
81 - 82 in {{HiveDrillNativeScanBatchCreator}} , we deliberately separate out partition columns
and table columns to pass partition columns separately:
> {noformat}
>       // Separate out the partition and non-partition columns. Non-partition columns
are passed directly to the
>       // ParquetRecordReader. Partition columns are passed to ScanBatch.
> {noformat}
> To fix the problem we need to pass table columns instead of all columns.
> {code:java}
>     if (readers.size() == 0) {
>       readers.add(new HiveDefaultReader(table, null, null, newColumns, context, conf,
>         ImpersonationUtil.createProxyUgi(config.getUserName(), context.getQueryUserName())));
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message