hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Bloniarz (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)
Date Mon, 16 Jul 2012 21:42:35 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brian Bloniarz updated HIVE-3198:
---------------------------------

    Attachment: TestStorageHandler.java

Here's a StorageHandler implementation which should help reproduce the bug. When I run it
like this:
{code}
$ mkdir /tmp/test; touch /tmp/test/part-00000
hive> add jar test.jar;
hive> create external table test (a string) STORED BY 'TestStorageHandler' location '/tmp/test';
hive> select * from test;
{code}
I see "TESTPROP: hello world", which means that the properties are being setup correctly.
But if you do:
{code}
hive> select a from test;
{code}
I see "TESTPROP: null", meaning that properties from configureInputJobProperties() don't get
passed to the getRecordReader() call.
                
> StorageHandler properties not passed to InputFormat (?)
> -------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>         Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties
to pass properties onto a serde & InputFormat, but it looks to me like the properties
aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called.
I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo
has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing
along the directory name inside the HiveInputSplit; this mean we don't have to figure out
which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message