hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject DistributedCache is empty
Date Thu, 16 Jan 2014 22:41:36 GMT
My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally.
 Nevertheless, neither -files nor DistributedCache methods seem to work.  Usage on the command
line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where
those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS
filesystem).  The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration())
and doesn't see the files, there's nothing there.  Likewise, if I attempt to run those python
scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.

That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless,
I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new
URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory.
 I then add conf to the job ctor, seems straight forward.  Still no dice, the mapper can't
see the files, they simply aren't there.

What on Earth am I doing wrong here?

Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda

View raw message