hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2046) Documentation: Hadoop Install/Configuration Guide and Map-Reduce User Manual
Date Wed, 24 Oct 2007 19:29:51 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated HADOOP-2046:
----------------------------------

    Attachment: HADOOP-2046_4_20071025.patch

Updated patch.

I'll file another jira for some more documentation via forrest which allows these to go into
0.15.0. The forrest is almost done too but it doesn't have to block 0.15.0 since the hadoop
website is the trunk and can be updated as soon as that patch goes in.

> Documentation: Hadoop Install/Configuration Guide and Map-Reduce User Manual
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2046
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2046
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.14.2
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-2046_1_20071018.patch, HADOOP-2046_2_20071022.patch, HADOOP-2046_3_20071023.patch,
HADOOP-2046_4_20071025.patch
>
>
> I'd like to put forward some thoughts on how to structure reasonably detailed documentation
for hadoop.
> Essentially I think of atleast 3 different profiles to target:
> * hadoop-dev, folks who are actively involved improving/fixing hadoop.
> * hadoop-user
> ** mapred application writers and/or folks who directly use hdfs
> ** hadoop cluster administrators
> For this issue, I'd like to first target the latter category (admin and hdfs/mapred user)
- where, arguably, is the biggest bang for the buck, right now. 
> There is a crying need to get user-level stuff documented, judging by the sheer no. of
emails we get on the hadoop lists...
> ----
> *1. Installing/Configuration Guides*
> This set of documents caters to folks ranging from someone just playing with hadoop on
a single-node to operations teams who administer hadoop on several nodes (thousands). To ensure
we cover all bases I'm thinking along the lines of:
> * _Download, install and configure hadoop_ on a single-node cluster: including a few
comments on how to run examples (word-count) etc.
> * *Admin Guide*: Install and configure a real, distributed cluster. 
> * *Tune Hadoop*: Separate sections on how to tune hdfs and map-reduce, targeting power
admins/users.
> I reckon most of this would be done via forrest, with appropriate links to javadoc.
> ---
> *2. User Manual*
> This set is geared for people who use hdfs and/or map-reduce per-se. Stuff to document:
> * Write a really simple mapred application, just fitting the blocks together i.e. maybe
a walk-through of a couple of examples like word-count, sort etc.
> * Detailed information on important map-reduce user-interfaces:
> *- JobConf
> *- JobClient
> *- Tool & ToolRunner
> *- InputFormat 
> *-- InputSplit
> *-- RecordReader
> *- Mapper
> *- Reducer
> *- Reporter
> *- OutputCollector
> *- Writable
> *- WritableComparable
> *- OutputFormat
> *- DistributedCache
> * SequenceFile
> *- Compression types: NONE, RECORD, BLOCK
> * Hadoop Streaming
> * Hadoop Pipes
> I reckon most of this would land up in the javadocs, specifically package.html and some
via forrest.
> ----
> Also, as discussed in HADOOP-1881, it would be quite useful to maintain documentation
per-release, even on the hadoop website i.e. we could have a main documentation page link
to documentation per-release and to the trunk.
> ----
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message