hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Hadoop Takes 6GB Memory to run one mapper
Date Thu, 27 Mar 2014 12:55:04 GMT
This discussion may also be relevant to your question:
Do you actually need to specify that -Xmx6000m for java heap or could it be one of the other
issues discussed?


From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 27, 2014 6:52 AM
To: user@hadoop.apache.org
Subject: RE: Hadoop Takes 6GB Memory to run one mapper

Could you have a pmem-vs-vmem issue as in:
From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Hadoop Takes 6GB Memory to run one mapper

Can you try storing your file as bytes instead of String. I can't think of any reason why
this would require 6 GB heap space.
Can you explain your use-case that might help some alternatives if you are interested.

On Tue, Mar 25, 2014 at 7:31 AM, Nivrutti Shinde <nivrutti.shinde@gmail.com<mailto:nivrutti.shinde@gmail.com>>
Yes it is in setup method, Just I am reading the file which is stored at hdfs

On Tuesday, 25 March 2014 12:01:08 UTC+5:30, Praveenesh Kumar wrote:
And I am guessing you are not doing this inside map() method right, its in setup() method

On Tue, Mar 25, 2014 at 6:05 AM, Nivrutti Shinde <nivrutt...@gmail.com<mailto:nivrutt...@gmail.com>>
private Map<String,String> mapData = new ConcurrentHashMap<String, String>(11000000);
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line = null;
String data[] = line.split("\t");

On Monday, 24 March 2014 19:17:12 UTC+5:30, Praveenesh Kumar wrote:
Can you please share your code snippet. Just want to see how are you loading your file into
mapper ?
On Mon, Mar 24, 2014 at 1:15 PM, Nivrutti Shinde <nivrutt...@gmail.com<mailto:nivrutt...@gmail.com>>
Thanks For your reply,
I tried THashMap but land up in same issue.
I tried map side join and cascading approach, but time taken by them is lot.

On Friday, 21 March 2014 12:03:28 UTC+5:30, Nivrutti Shinde wrote:

I have use case where I am loading 200 MB file with 11 million records, (one record length
is 12 ). Into map, so while running the hadoop job, i can quickly get value for the key from
each input record in mapper.

Such a small file but to load the data into map, i have to allocate the 6 GB heap for the
same. when i run small code to load this file on standalone application, it requires 2 GB

I dont understand why hadoop required 6GB to load the data into memory. Hadoop Job Runs fine
after that but number of mappers i can run is 2. I need to get it done this in 2-3 GB only
so i can run ateast 8-9 mappers per node.

I have created gzip file(which is now only 17MB). I have kept the file on HDFS. Using HDFS
API to read the file and loading the data into map. Block size is 128 MB. Cloudera Hadoop.

Any help or alternate approaches to load data into memory with minimum heap size. So i can
run many mappers with 2-3 gb memory allocated to each.

View raw message