hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TJ Tech <>
Subject Pointing Hive external table partition to multiple locations?
Date Mon, 09 Nov 2015 04:21:54 GMT

I need to process a few hundred thousands of files (1-2 GB each) scattered
in thousands of different directories.

I'd like to partition/group them based on my custom logic so I can benefit
from partition pruning. Each partition will contain a few hundreds files
from hundreds of different directories.

Is this supported? From Hive Language manual DDL, a partition can be
pointed to only one location. If I add one partition for each file I plan
to process, I'd end up have a few hundreds and even thousands of
partitions. I suspect this might result in hundreds to thousands of MR
tasks in Hadoop.

I noticed there is a feature added to support pointing an external table to
multiple locations listed in a symlink file: (for TextInputFormat only)

Is there a similar feature in work for partition? If so, would it support
other formats (avro, parquet, etc)?



View raw message