hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kartashov, Andy" <Andy.Kartas...@mpac.ca>
Subject question on FileInputFormat.addInputPath and data access
Date Wed, 24 Oct 2012 14:23:27 GMT
Two questions:

1.       Say you have 5 folders with input data (fold1,fold2,fold3,....,fold5) in you hdfs
in pseudo-dist mode cluster.
You will write your MR job to access your files by listing them in :
FileInputFormat.addInputPaths(job, "fold1, fold2, fold3...,fold5");
Q: Is there a way to move the above folders to the parent folder say, "the_folder", so that
the dir struct will be the_folder/fold1, the_folder/fold2... Will it be possible to access
your files with something like: FileInputFormat.addInputPaths(job, "the_fold1/*"); or similar?
I am asking in case your input folders list grows too long. How to curb that?

2.       Hypothetically speaking  in fully-dist mode cluster your folders with Data are located
as follows:  Node1: (fold1,fold2,fold3) and  Node2:(fold4, fold5)

Q: Do we change below command  or will NN and JT  take care how of locating those files?
FileInputFormat.addInputPaths(job, "fold1, fold2, fold3...,fold5");
     2a.     Using Data balancer which splits input/moves Data across additional DNs indicated
in conf/slaves,  is it possible to run "hdfs dfs -ls -r " command  on the slave node that
runs DN on a separate machine? I have



NOTICE: This e-mail message and any attachments are confidential, subject to copyright and
may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not
the intended recipient, please delete and contact the sender immediately. Please consider
the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe
qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts
par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite.
Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement
l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

View raw message