hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navis (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3198) Table properties of non-native table are not transferred to RecordReader
Date Tue, 17 Jul 2012 00:03:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Navis updated HIVE-3198:
------------------------

    Description: 
I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to
pass properties onto a serde & InputFormat, but it looks to me like the properties aren't
present inside the InputFormat.

I found the following code which looks like it's supposed to propagate JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called.
I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has
an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by passing along
the directory name inside the HiveInputSplit; this mean we don't have to figure out which
files are a part of which partition.


  was:
I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to
pass properties onto a serde & InputFormat, but it looks to me like the properties aren't
present inside the InputFormat.

I found the following code which looks like it's supposed to propagate JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called.
I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has
an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by passing along
the directory name inside the HiveInputSplit; this mean we don't have to figure out which
files are a part of which partition.

       Assignee: Navis
        Summary: Table properties of non-native table are not transferred to RecordReader
 (was: StorageHandler properties not passed to InputFormat (?))

For non-native tables hive delegates HiveInputFormat to create input splits and record readers.
But most of input formats in hadoop replaces directories (which is location of table/partition)
to concrete file names in it, which causes not finding appropriate partition desc by simple
map access of pathToPartitionInfo.

It can be simply fixed by searching partition in recursive manner which is CombinHiveInputFormat
is already doing as commented below. But it seemed to hard to make a proper test case for
this case, so I'll just upload the code patch.
                
> Table properties of non-native table are not transferred to RecordReader
> ------------------------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>            Assignee: Navis
>         Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties
to pass properties onto a serde & InputFormat, but it looks to me like the properties
aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called.
I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo
has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing
along the directory name inside the HiveInputSplit; this mean we don't have to figure out
which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message