hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Segel, Mike" <mse...@navteq.com>
Subject RE: About the DistributedCache
Date Wed, 27 Jul 2011 15:12:53 GMT
I think you're making it harder than you have to ...
First you don't need to alias your file name so you don't need the # and the alias after it.
So your lines:
    String path = "/user/Li/model/model.txt"; Path filePath = new Path(path); String uriWithLink
= filePath.toUri().toString() + "#" + "model.txt";      
    System.out.println(uriWithLink); DistributedCache.addCacheFile(new URI(uriWithLink), conf);
Become:
    DistributedCache.addCacheFile(new URI(path+"model.txt",conf));

Your code is that you're taking a string, making it a path, back to a string to a new URI.


Then in your mapper...
       private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
        boolean exitProcess = false;
       int i=0;
        while (!exit){ 
            fileName = localFiles[i].getName();
           if (fileName.equalsIgnoreCase("model.txt")){
                 // Build your input file reader on localFiles[i].toString() 
                 exitProcess = true;
           }
            i++;
        } 


Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and
you go beyond the size of the array localFiles[].
Also I set exit to false because its easier to read this as "Do this loop until the condition
exitProcess is true".

When you build your file reader you need the full path, not just the file name. The path will
vary when the job runs.

HTH

-Mike


-----Original Message-----
From: Weiwei Li [mailto:hadoop.li@gmail.com] 
Sent: Wednesday, July 27, 2011 2:03 AM
To: general@hadoop.apache.org
Subject: About the DistributedCache

Hi,
I have met some problem about the DistributedCache.

There is a document called 'model.txt',  I want every mapper can read it because there are
some public data in it.
So, I use the DistributedCache.

1.In the main()
DistributedCache.createSymlink(conf);
String path = "/user/Li/model/model.txt"; Path filePath = new Path(path); String uriWithLink
= filePath.toUri().toString() + "#" + "model.txt"; System.out.println(uriWithLink); DistributedCache.addCacheFile(new
URI(uriWithLink), conf);

2.In the Mapper()
protected void setup(Context context) throws IOException,InterruptedException {
  System.out.println("Now, use the distributed cache and syslink"); try {


                FileReader reader = new FileReader("model.txt"); BufferedReader br = new BufferedReader(reader);
String s1 = null; while ((s1 = br.readLine()) != null) { System.out.println(s1); } br.close();
reader.close();



} catch (Exception e) {
e.printStackTrace();
}
}

3.When run it, in the Task logs.
java.io.FileNotFoundException: model.txt (拒绝访问。)
at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileInputStream.<init>(FileInputStream.java:79)
at java.io.FileReader.<init>(FileReader.java:41)
at
NB.NBClusterTrain.UseDistributedCacheBySymbolicLink(NBClusterTrain.java:24)
at NB.NBClusterTrain$NBClusterTrainMapper.setup(NBClusterTrain.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

4.When I use /bin/hadoop fs -cat /user/Li/model/model.txt, This can be read.

What do you think can I do?
Thank you!


The information contained in this communication may be CONFIDENTIAL and is intended only for
the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby
notified that any dissemination, distribution, or copying of this communication, or any of
its contents, is strictly prohibited.  If you have received this communication in error, please
notify the sender and delete/destroy the original message and any copy of it from your computer
or paper files.
Mime
View raw message