hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3799) Design a pluggable interface to place replicas of blocks in HDFS
Date Wed, 13 May 2009 17:53:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709027#action_12709027

dhruba borthakur commented on HADOOP-3799:

> What external db? Dhruba, could you please elaborate what you 

We have deployed the scripts from HADOOP-3708 to collect (online)  job logs into a mysql DB.
This DB also contains the hive query that the job makes. It is easy to see what datasets are
being used together in most queries. My idea is to dynamically co-locate blocks based on these
access patterns. I will present those ideas in a separate JIRA (once this JIRA gets through)

> I would rather allow to format a file system with a specific policy and then keep it
constant for the lifespan of the system.

That would be a fine goal for your cluster. However, I would like the API to be more flexible
than that. Does it sound reasonable?

> Design a pluggable interface to place replicas of blocks in HDFS
> ----------------------------------------------------------------
>                 Key: HADOOP-3799
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3799
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: BlockPlacementPluggable.txt
> The current HDFS code typically places one replica on local rack, the second replica
on remote random rack and the third replica on a random node of that remote rack. This algorithm
is baked in the NameNode's code. It would be nice to make the block placement algorithm a
pluggable interface. This will allow experimentation of different placement algorithms based
on workloads, availability guarantees and failure models.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message