hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Kerzner <markkerz...@gmail.com>
Subject Re: Large Text object to String conversion
Date Tue, 22 Dec 2009 15:28:46 GMT
Bhushan,

have you considered simply raising the memory limit for Hadoop? 100M-300M is
not that much, and 2 Gigs is very mode memory requirement of the today's
machines. For comparison, small EC2 has 1.7 Gig

On Tue, Dec 22, 2009 at 9:10 AM, Jason Venner <jason.hadoop@gmail.com>wrote:

> The text class supports low level access to the underlying byte array in
> the
> text object
>
> You can call getbytes directly and then incrementally transcode the bytes
> into characters using the charset encoder tools,
> or call the charAt method to get the characters one by 1.
> The bytesToCodePoint method provides a simpler interface for sequentially
> working through the data.
>
> On Thu, Oct 29, 2009 at 4:18 AM, bhushan_mahale <
> bhushan_mahale@persistent.co.in> wrote:
>
> > Hi,
> >
> > I am writing an M-R code using MapRunnable interface.
> > The input format is SequenceFileInputFormat.
> >
> > Each Sequence-record contains a key-value pair of type <Text key,Text
> > value> (Text: org.apache.hadoop.io.Text)
> >
> > The "key" Text object contains small string where as "value" Text object
> > contains large XML string.
> > "value" Text object can contain the data as large as 100 to 300 MB.
> >
> > I convert the "value" Text object to String using value.toString()
> method.
> > It goes OutOfMemory for large data in "value" object.
> >
> > Is there any other way for converting large Text object to java String
> > object?
> > Alternatively, can I limit the number of records in RecordReader object
> > coming to run method so that total memory utilization would be limited?
> >
> > Thanks,
> > - Bhushan
> >
> >
> > DISCLAIMER
> > ==========
> > This e-mail may contain privileged and confidential information which is
> > the property of Persistent Systems Ltd. It is intended only for the use
> of
> > the individual or entity to which it is addressed. If you are not the
> > intended recipient, you are not authorized to read, retain, copy, print,
> > distribute or use this message. If you have received this communication
> in
> > error, please notify the sender and delete all copies of this message.
> > Persistent Systems Ltd. does not accept any liability for virus infected
> > mails.
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message