hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Commented: (HIVE-524) ExecDriver adds 0 byte file to input paths
Date Fri, 29 May 2009 19:30:45 GMT


Namit Jain commented on HIVE-524:

The problem is that downstream map-reduce jobs can run into problems.

For eg:

consider the query:

select .... from
(query 1  union all query 2);

It will result in 3 map-reduce jobs: query 1, query 2 and outer query depending on query 1
and query2.

If query2 had empty partitions, and we disallow it.
outer query will fail because the output for query 2 has not been created.

That's why we create a dummy file

The correct fix would be to create a file based on the table descriptor instead of some hard-coded
value. Then, the custom input format can be attached to the table descriptor and will work
I am already in the process of implementing that as part of map-join, and will merge it in

> ExecDriver adds 0 byte file to input paths
> ------------------------------------------
>                 Key: HIVE-524
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.4.0
>            Reporter: Johan Oskarsson
>             Fix For: 0.4.0
> In the addInputPaths method in ExecDriver:
> If the input path of a partition cannot be found or contains no files with data in them,
a 0 byte file is created and added to the job instead. This causes our custom InputFormat
to throw an exception since it is asked to process an unknown file format (not an lzo file).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message