hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudharsan Sampath <sudha...@gmail.com>
Subject Re: Very slow MapReduce Job
Date Tue, 30 Aug 2011 06:46:10 GMT

Is it slow compared to your vanilla version of processing serially?
Generally Pseudo set ups should be just used to verify the correctness of
the program logic and for performance statistics you should run it at a real
cluster where you can achieve parallelism and thus its benefits.

Sudhan S

On Mon, Aug 29, 2011 at 3:30 PM, Varad Meru <varad_meru@persistent.co.in>wrote:

> Hi,
> I wrote a custom InputFormat for parsing through the Enron Email corpus
> which is attached in the file named EmailInputFormat
> I have attached the code in a text file with the sample input mail also
> attached as a text document
> The EmailClass extends Writable and implements all the methods needed to be
> implemented and also contains an initiate function to initialize the values
> in that class.
> This initiate method looks is written in the EmailClass.java
> The above method is called by nextKeyValue method which is written in the
> EmailRecordReader.txt
> ------------------------------------
> Question:
> 1. Is it a feasible to build large custom objects within nextKeyValue() to
> run in Hadoop?
> 2. MR program which does a simple task of emitting message-id and from
> field email-id from enron corpus of 6 lakh emails merged into one file (174
> MB) takes around 50 minutes on a pseudo node cluster. This is very very
> slow.
> Please help me in this aspect too.
> 3. Can static field of value in EMailRecordReader help in this situation?
> Thanks in advance,
> Varad.
> ------------------------------------
> Varad Meru| Software Engineer
> varad_meru@persistent.co.in
> Persistent Systems and Solution Ltd. | Partners in Innovation |
> www.persistentsys.com
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.

View raw message