hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject RE: Profiling map reduce jobs?
Date Sat, 29 Jun 2013 14:34:06 GMT
I just advice to use MultipleOutputFormat, instead of MultipleOurput.write

--Send from my Sony mobile.
On Jun 29, 2013 9:16 PM, "David Poisson" <David.Poisson@ca.fujitsu.com>
wrote:

> Just thought I'd provide some insight into our problem.
>
> It appears that the problem was a slowdown caused by the use of
> multipleOutputs.write(output, key, keyValue, path) (going from memory
> here). Anyways, after looking at the implementation of that write function
>  in multipleOutputs.java it appears that a context was created and a conf
> was gotten and a new recordWriter was gotten for every call to
> write(output, key, keyValue, path).
>
> We have changed all of those calls to write(output, key, keyValue) (which
> doesn't do any extra things) and it seems to help.
>
> Anyone else has any tips when using multipleOutputs?
>
> We are taking our input and splitting it into 3 files. So it seems to be a
> natural choice for MultipleOutputs. Performance is a bit slow though.
>
> Cheers!
>
> David
> ________________________________________
> From: David Poisson [David.Poisson@ca.fujitsu.com]
> Sent: Thursday, June 27, 2013 4:22 PM
> To: user@hbase.apache.org
> Subject: Profiling map reduce jobs?
>
> Howdy,
>      I want to take a look at a MR job which seems to be slower than I had
> hoped. Mind you, this MR job is only running on a pseudo-distributed VM
> (cloudera cdh4).
>
> I have modified my mapred-site.xml with the following (that last one is
> commented out because it crashes my MR job):
>
>   <property>
>     <name>mapred.task.profile</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.maps</name>
>     <value>0-2</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.reduces</name>
>     <value>0-2</value>
>   </property>
>   <!--property>
>     <name>mapred.task.profile.params</name>
>
> <value>agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s</value>
>   </property-->
> Are there any resources that explain how to interpret the results?
> Or maybe an open-source app that could help display the results in a more
> intuiative manner?
>
> Ideally, we'd want to know where we are spending most of our time.
>
> Cheers,
>
> David

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message