hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim the Standing Bear" <standingb...@gmail.com>
Subject Re: newbie seeking inputs and help
Date Sun, 21 Oct 2007 20:10:48 GMT
Thanks Ted.

While the slides indeed give me valuable insights on the project I
have in mind, I would still like to see some detailed
examples/documentations on the different mappers and reducers that
come with hadoop.  Do you happen to know where I can find such texts?
Thanks.

-- Jim


On 10/20/07, Ted Dunning <tdunning@veoh.com> wrote:
>
> Look for the slide show on Nutch and Hadoop.
>
> http://wiki.apache.org/lucene-hadoop/HadoopPresentations
>
> open the one called "Scalable Computing with Hadoop (Doug Cutting, May
> 2006)"
>
>
> On 10/20/07 1:53 PM, "Jim the Standing Bear" <standingbear@gmail.com> wrote:
>
> > Hi,
> >
> > I have been studying map reduce and hadoop for the past few weeks, and
> > found it a very new concept.  While I have a grasp of the map reduce
> > process as well as being able to follow some of the example code, I
> > still feel at a loss when it comes to creating my own exercise
> > "project" and would appreciate any inputs and help on that.
> >
> > The project I am having in mind is to leech several (hundred) HTML
> > files from a website, and use hadoop to index the words of each page
> > so they can be later searched.  However, in all examples I have seen
> > so far, the data are split into HDFS prior to the execution of the
> > job.
> >
> > Here is the set of questions I have:
> >
> > 1. Is CopyFiles.HTTPCopyFilesMapper and/or ServerAddress what I need
> > for this project
> >
> > 2. If so, are there any detailed documentations/examples on these classes?
> >
> > 3. If not, could you please let me know conceptually how you would go
> > about doing this?
> >
> > 3. If data must be split beforehand, do I must manually retrieve all
> > the webpages and load them into HDFS?  or do I list the URLs of the
> > webpages into a text file and split this file instead?
> >
> > As you can see, I am very confused at this point and would greatly
> > appreciate all the help I could get.  Thanks!
> >
> > -- Jim
>
>


-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------

Mime
View raw message