hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roshan Naik (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-5143) Streaming - Compaction of partitions
Date Fri, 23 Aug 2013 01:09:51 GMT
Roshan Naik created HIVE-5143:
---------------------------------

             Summary: Streaming - Compaction of partitions
                 Key: HIVE-5143
                 URL: https://issues.apache.org/jira/browse/HIVE-5143
             Project: Hive
          Issue Type: Sub-task
            Reporter: Roshan Naik
            Assignee: Roshan Naik


Task is to support compaction of partitions.

Rationale: Streaming partitions are composed of a large number of small files (each commit
is one file). Since compaction can be a potentially expensive operation (for e.g. converting
to single ORC file), we do not compact the streaming partition at the time of rolling it into
a standard partition. This allows rolling to be quick and atomic.

Compaction will be performed at a later time. The streaming partition is converted as is (typically
with a many small files) into a standard partition. This new standard partition will be queued
up for compaction by a separate job.

This decouples the compaction feature from streaming support, and makes it more generally
available for any partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message