hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1357) CombineHiveInputSplit should initialize the inputFileFormat once for a single split
Date Fri, 23 Jul 2010 10:02:51 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Carl Steinbach updated HIVE-1357:
---------------------------------

    Fix Version/s: 0.6.0
      Component/s: Query Processor
                   Serializers/Deserializers

> CombineHiveInputSplit should initialize the inputFileFormat once for a single split
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1357
>                 URL: https://issues.apache.org/jira/browse/HIVE-1357
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1357.patch
>
>
> If a split consists of multiple files, the FileFormat should always be the same, whether
RCFile or SequenceFile. Currently the CombineHiveInputSplit tries to get the inputFileFormat
for each new file in the split, which is O(n) where n is the number of files in the split.
This is an O(n^2) operation and degrade the performance badly for combining large number of
small files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message