hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop
Date Wed, 31 Oct 2007 20:09:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171

Milind Bhandarkar commented on HADOOP-1917:

Comments on HADOOP-1917


"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the namespace
and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and JobTracker are
server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to "slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably
file a jira*

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.


*consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial
is to provide all the information needed to understand map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat,
inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*

Overall comments: This is extremely useful. However, the level of detail is overwhelming for
a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough
to understand WordCount, and Advanced should then go into all the details ?

> Need configuration guides for Hadoop
> ------------------------------------
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
> We've recently had a spate of questions on the users list regarding features such as
rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective.
There is some Javadoc present but most of the "documentation" exists either in JIRA or in
the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These
should probably be in forest and accessible from the project website (Javadoc isn't always
approachable to our non-programmer audience). Committers should look for user documentation
before accepting patches.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message