hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-1917) Need configuration guides for Hadoop
Date Wed, 31 Oct 2007 20:11:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171
] 

milindb edited comment on HADOOP-1917 at 10/31/07 1:10 PM:
---------------------------------------------------------------------

Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> whats the ant target ?
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> need some typical values here ?
"where the NameNode stores the name table" -> "where the NameNode stores the namespace
and transactions logs persistently"
"server and client machines." -> need to document early that NameNode and JobTracker are
server machines, and "DataNode+TaskTracker" are client machines
"slave processors" -> please use consistent terminology, prefer "worker" to "slave"
argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably
file a jira

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.

mapred_tutorial.html:

consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial
is to provide all the information needed to understand map-reduce
A picture would help in the overview.
In the Input and Output section, remove the use of combiner.
In the wordcount example, simplify it even more by avoiding the use of ToolRunner
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat,
inputsplit etc.
please provide a javadoc link to DistributedCache at the first mention


Overall comments: This is extremely useful. However, the level of detail is overwhelming for
a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough
to understand WordCount, and Advanced should then go into all the details ?

      was (Author: milindb):
    Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the namespace
and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and JobTracker are
server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to "slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably
file a jira*

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.

mapred_tutorial.html:

*consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial
is to provide all the information needed to understand map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat,
inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*


Overall comments: This is extremely useful. However, the level of detail is overwhelming for
a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough
to understand WordCount, and Advanced should then go into all the details ?
  
> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as
rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective.
There is some Javadoc present but most of the "documentation" exists either in JIRA or in
the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These
should probably be in forest and accessible from the project website (Javadoc isn't always
approachable to our non-programmer audience). Committers should look for user documentation
before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message