hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abc xyz <fabc_xyz...@yahoo.com>
Subject Re: reading distributed cache returns null pointer
Date Thu, 08 Jul 2010 22:03:57 GMT
Hi Rahul,
Thanks. It worked. I was using getFileClassPaths() to get the paths to the files 
in the cache and then use this path to access the file. It should have worked 
but I don't know why that doesn't produce the required result.

I added the file HDFS file DCache/Orders.txt to my distributed cache. After 
calling DistributedCache.getCacheFiles(conf); in the configure method of the 
mapper node, if I read the file now from the returned path (which happens to be 
DCache/Orders.txt) using the Hadoop API , would the file be read from the local 
directory of the mapper node? More specifically I am doing this:

            FileSystem        hdfs=FileSystem.get(conf);
            URI[] uris=DistributedCache.getCacheFiles(conf);
            Path my_path=new Path(uris[0].getPath());

                FSDataInputStream    fs=hdfs.open(my_path);


From: Rahul Jain <rjain7@gmail.com>
To: common-user@hadoop.apache.org
Sent: Thu, July 8, 2010 8:15:58 PM
Subject: Re: reading distributed cache returns null pointer

I am not sure why you are using getFileClassPaths() API to access files...
here is what works for us:

Add the file(s) to distributed cache using:
DistributedCache.addCacheFile(p.toUri(), conf);

Read the files on the mapper using:

URI[] uris = DistributedCache.getCacheFiles(conf);
// access one of the files:
paths[0] = new Path(uris[0].getPath());
// now follow hadoop or local file APIs to access the file...

Did you try the above and did it not work ?


On Thu, Jul 8, 2010 at 12:04 PM, abc xyz <fabc_xyz111@yahoo.com> wrote:

> Hello all,
> As a new user of hadoop, I am having some problems with understanding some
> things. I am writing a program to load a file to the distributed cache and
> read
> this file in each mapper. In my driver program, I have added the file to my
> distributed cache using:
>        Path p=new
> Path("hdfs://localhost:9100/user/denimLive/denim/DCache/Orders.txt");
>         DistributedCache.addCacheFile(p.toUri(), conf);
> In the configure method of the mapper, I am reading the file from cache
> using:
>             Path[] cacheFiles=DistributedCache.getFileClassPaths(conf);
>             BufferedReader joinReader=new BufferedReader(new
> FileReader(cacheFiles[0].toString()));
> however, the cacheFiles variable has null value in it.
> There is something mentioned on the Yahoo tutorial for hadoop about
> distributed
> cache which I do not understand:
> As a cautionary note: If you use the local JobRunner in Hadoop (i.e., what
> happens if you call JobClient.runJob()in a program with no or an empty
> hadoop-conf.xmlaccessible), then no local data directory is created; the
> getLocalCacheFiles()call will return an empty set of results. Unit test
> code
> should take this into account."
> what does this mean? I am executing my program in pseudo-distributed mode
> on
> windows using Eclipse.
> Any suggestion in this regard is highly valued.
> Thanks  in advance.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message