hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kartashov, Andy" <Andy.Kartas...@mpac.ca>
Subject io.file.buffer.size
Date Wed, 21 Nov 2012 16:18:26 GMT
Guys,

I've read that increasing above (default 4kb) number to, say 128kb, might speed things up.

My input is 40mln serialised records coming from RDMS and I noticed that with increased IO
my job actually runs a tiny bit slower. Is that possible?

p.s. got two questions:
1. During Sqoop import I see that two additional files are generated in the HDFS folder, namely
.../_log/history/...conf.xml
.../_log/history/...sqoop_generated_class.jar
Is there a way to redirect these files to a different directory? I cannot find an answer.

2. I run multiple reducers and each generate each own output. If I was to merge all the output,
will running either of the below commands be recommended?

hadoop dfs -getmerge <output/*> <localdst>
or
hadoop dfs -cat output/* > output_All
hadoop dfs -get output_All <localdst>

Thanks,
AK


NOTICE: This e-mail message and any attachments are confidential, subject to copyright and
may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not
the intended recipient, please delete and contact the sender immediately. Please consider
the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe
qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts
par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite.
Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel
Mime
View raw message