hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pumudu ruhunage <pumud...@gmail.com>
Subject Re: Doubts in Map reduce programs
Date Sat, 01 Nov 2014 16:10:50 GMT
Hi,

There are some great map reduce samples in hadoop itself. Have you seen
them ? If you have hadoop 2.2.0 and if you goto
{hadoop_base}/share/hadoop/mapreduce you can find bunch of great sample map
reduce programs. In different versions of hadoop this directory can be
different.

Regards,
Pumudu

On 1 November 2014 21:23, Shahab Yunus <shahab.yunus@gmail.com> wrote:

> One way that I can think of is that you basically need to define your own
> InputFormal and RecordReader so that each record is 'a paragraph' or a
> 'sentence'. The reason being that in regular case, a line terminated by
> standard end of line characters is considered as one record for
> FileInputFormat. Here, you instead want to get one paragraph as one record
> instead of one line. So, once you override a RecordReader, you will have
> control on how do you want to define a 'record' that is passed to each map
> task.
>
> Some starting points...E.g. look here to define and implement your own
> RecordReader for FileInputFormat:
>
> http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
> http://www.infoq.com/articles/HadoopInputFormat
>
> http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/
>
> Regards,
> Shahab
>
> Regards,
> Shahab
>
> On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
> raghavchandra.learning@gmail.com> wrote:
>
>> Hi There,
>>
>> I have couple of doubts in Hadoop, it would be really helpful if anyone
>> can answer these questions or if this is already answered somewhere, the
>> link to that would be helpful.
>>
>> Below are my doubts:
>>
>> 1. How to count the number of paragraphs in a text file using java map
>> reduce ?
>>
>> 2. How to count the number of sentences in a paragraph/file using java
>> map reduce ?
>>
>> Please let me know where I can get the map reduce programs list with
>> different use cases.
>>
>> Looking forward for your responses.
>>
>>
>

Mime
View raw message