hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bhushan_mahale <bhushan_mah...@persistent.co.in>
Subject RE: Problem to create sequence file for
Date Tue, 27 Oct 2009 14:25:22 GMT
Hi Jason,

Thanks for the reply.
The string is the entire content of the input text file.
It could as long as ~300MB.
I tried increasing jvm heap but unfortunately it was giving same error.

Other option I am thinking is to split input files first.

- Bhushan
-----Original Message-----
From: Jason Venner [mailto:jason.hadoop@gmail.com] 
Sent: Tuesday, October 27, 2009 7:19 PM
To: common-user@hadoop.apache.org
Subject: Re: Problem to create sequence file for

How large is the string that is being written?
Does it contain the entire contents of your file?
You may simple need to increase the heap size with your jvm.


On Tue, Oct 27, 2009 at 3:43 AM, bhushan_mahale <
bhushan_mahale@persistent.co.in> wrote:

> Hi,
>
> I have written a code to create sequence files for given text files.
> The program takes following input parameters:
>
>  1.  Local source directory - contains all the input text files
>  2.  Destination HDFS URI - location on hdfs where sequence file will be
> copied
>
> The key for a sequence-record is the file-name.
> The value for a sequence-record is the content of the text file.
>
> The program runs fine for large number input text files. But if the size of
> a single input text file is > 100 MB then it throws following exception:
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>        at java.lang.String.toCharArray(String.java:2726)
>        at org.apache.hadoop.io.Text.encode(Text.java:388)
>        at org.apache.hadoop.io.Text.set(Text.java:178)
>        at org.apache.hadoop.io.Text.<init>(Text.java:81)
>        at SequenceFileCreator.create(SequenceFileCreator.java:106)
>        at SequenceFileCreator.processFile(SequenceFileCreator.java:168)
>
> I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the
> sequence file. The Text class is used for keyclass and valclass.
>
> I tried increasing the max memory for the program but it throws same error.
>
> Can you provide your suggestions?
>
> Thanks,
> - Bhushan
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.

Mime
View raw message