hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1411) [Zebra] Can Zebra use HAR to reduce file/block count for namenode
Date Tue, 29 Jun 2010 01:21:50 GMT

     [ https://issues.apache.org/jira/browse/PIG-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-1411:
--------------------------------

    Fix Version/s:     (was: 0.8.0)
      Description: 
Due to column group structure,  Zebra can create extra files for namenode to remember. That
means namenode taking more memory for Zebra related files.

The goal is to reduce the no of files/blocks

The idea among various options is to use HAR ( Hadoop Archive ). Hadoop Archive reduces the
block  and file count by copying data from small files ( 1M, 2M ...) into a hdfs-block of
larger size. Thus, reducing the total no. of blocks and files.


 

  was:

Due to column group structure,  Zebra can create extra files for namenode to remember. That
means namenode taking more memory for Zebra related files.

The goal is to reduce the no of files/blocks

The idea among various options is to use HAR ( Hadoop Archive ). Hadoop Archive reduces the
block  and file count by copying data from small files ( 1M, 2M ...) into a hdfs-block of
larger size. Thus, reducing the total no. of blocks and files.


 


> [Zebra] Can Zebra use HAR to reduce file/block count for namenode
> -----------------------------------------------------------------
>
>                 Key: PIG-1411
>                 URL: https://issues.apache.org/jira/browse/PIG-1411
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>            Priority: Minor
>
> Due to column group structure,  Zebra can create extra files for namenode to remember.
That means namenode taking more memory for Zebra related files.
> The goal is to reduce the no of files/blocks
> The idea among various options is to use HAR ( Hadoop Archive ). Hadoop Archive reduces
the block  and file count by copying data from small files ( 1M, 2M ...) into a hdfs-block
of larger size. Thus, reducing the total no. of blocks and files.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message