hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " Marcos Ortiz Valmaseda" <mlor...@uci.cu>
Subject Re: .deflate trouble
Date Fri, 15 Feb 2013 18:57:17 GMT
Yes, I know, Keith. I know that you want more control over your Hadoop cluster, so I recommend
you three things: 
- You can use Whirr to manage your Hadoop clusters installations en EC2 [1] 
- You can create your own Hadoop-focused AMI based in your requirements (my favorite choice
here) 
- Or simply install Hadoop on EC2 with Puppet or Chef to have a better control over your configuration
and management. 
- Or, if you have a good pay check, you can choose MapR M3 or M5 distribution in Amazon Marketplace.[2][3]
[4] 

[1] http://whirr.apache.org 
[2] https://aws.amazon.com/marketplace/pp/B008B7VT2C 
[3] https://aws.amazon.com/marketplace/pp/B008B7WAAW/ref=sp_mpg_product_title?ie=UTF8&sr=0-2

[4] http://aws.amazon.com/es/elasticmapreduce/mapr/ 

----- Mensaje original -----

De: "Keith Wiley" <kwiley@keithwiley.com> 
Para: user@hadoop.apache.org 
Enviados: Viernes, 15 de Febrero 2013 12:36:20 
Asunto: Re: .deflate trouble 

I might contact them but we are specifically avoiding EMR for this project. We have already
successfully deployed EMR but we want more precise control over the cluster, namely the ability
to persist and reawaken it on demand. We really want a direct Hadoop installation instead
of an EMR-based installation. But I might contact them anyway to see what they recommend.
Thanks for he refs. 

On Feb 14, 2013, at 19:09 , Marcos Ortiz Valmaseda wrote: 

> Regards, Keith. For EMR issues and stuff, you can contact directly to Jeff Barr(Chief
Evangelist for AWS) or to Saurabh Baji (Product Manager for AWS EMR). 
> Best wishes. 
> 
> De: "Keith Wiley" <kwiley@keithwiley.com> 
> Para: user@hadoop.apache.org 
> Enviados: Jueves, 14 de Febrero 2013 15:46:05 
> Asunto: Re: .deflate trouble 
> 
> Good call. We can't use the conventional web-based JT due to corporate access issues,
but I looked at the job_XXX.xml file directly, and sure enough, it set mapred.output.compress
to true. Now I just need to remember how that occurs. I simply ran the wordcount example straight
off the command line, I didn't specify any overridden conf settings for the job. 
> 
> Ultimately, the solution (or part of it) is to get away from .19 to a more up-to-date
version of Hadoop. I would prefer 2.0 over 1.0 in fact, but due to a remarkable lack of concise
EC2/Hadoop documentation (and the fact that what docs I did find were very old and therefore
conformed to .19 style Hadoop), I have fallen back on old versions of Hadoop for my initial
tests. In the long run, I will need to get a more modern version of Hadoop to successfully
deploy on EC2. 
> 
> Thanks. 
> 
> On Feb 14, 2013, at 15:02 , Harsh J wrote: 
> 
> > Did the job.xml of the job that produced this output also carry 
> > mapred.output.compress=false in it? The file should be viewable on the 
> > JT UI page for the job. Unless explicitly turned out, even 0.19 
> > wouldn't have enabled compression on its own. 


________________________________________________________________________________ 
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com 

"What I primarily learned in grad school is how much I *don't* know. 
Consequently, I left grad school with a higher ignorance to knowledge ratio than 
when I entered." 
-- Keith Wiley 
________________________________________________________________________________ 




-- 

Marcos Ortiz Valmaseda, 
Product Manager && Data Scientist at UCI 
Blog : http://marcosluis2186.posterous.com 
LinkedIn: http://www.linkedin.com/in/marcosluis2186 
Twitter : @marcosluis2186 

Mime
View raw message