hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Xia_Y...@Dell.com>
Subject RE: How to configure mapreduce archive size?
Date Thu, 11 Apr 2013 18:10:45 GMT
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are
some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce
API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase
use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value

       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just
giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed
cache (I guess not) ?


On Thu, Apr 11, 2013 at 3:34 AM, <Xia_Yang@dell.com<mailto:Xia_Yang@dell.com>>
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start
my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than
1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar



From: Arun C Murthy [mailto:acm@hortonworks.com<mailto:acm@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a
little while (takes sometime for the cleaner thread to kick in).


On Apr 11, 2013, at 2:29 AM, <Xia_Yang@Dell.com<mailto:Xia_Yang@Dell.com>> <Xia_Yang@Dell.com<mailto:Xia_Yang@Dell.com>>

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is
in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for
my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes
more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?




From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<mailto:yhemanth@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?


This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
There is a configuration key "local.cache.size" which controls the amount of data stored under
DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted
if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently
to you.

So you could check what is being stored here and based on that lower the limit of the cache
size if you feel that will help. The property needs to be set in mapred-default.xml.


On Mon, Apr 8, 2013 at 11:09 PM, <Xia_Yang@dell.com<mailto:Xia_Yang@dell.com>>

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some
mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive
has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.



Arun C. Murthy
Hortonworks Inc.

View raw message