hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mapred Learn <mapred.le...@gmail.com>
Subject Re: Can you access Distributed cache in custom output format ?
Date Fri, 29 Jul 2011 17:56:48 GMT
i m trying to access file that I sent as -files option in my hadoop jar
command.

in my outputformat,
I am doing something like:

Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

        String file1="";
        String file2="";
        Path pt=null;

        for (Path p : cacheFiles) {

            if (p != null) {
                if (p.getName().endsWith(".ryp")) {
                    file1 = p.getName();
                } else if (p.getName().endsWith(".cpt")) {
                    file2 = p.getName();
                    pt=p;
                }

            }

        }

// then read the file, which gives file does not exist exception:

Path pat = new Path(file2);

        BufferedReader reader = null;
        try {
            FileSystem fs = FileSystem.get(conf);
            reader=new BufferedReader(
                    new InputStreamReader(fs.open(pat)));


            String line = null;
            while ((line = reader.readLine()) != null) {
                System.out.println("Now parsing the line: " + line);


            }
        } catch (Exception e) {
            System.out.println("exception" + e.getMessage());
        }

On Fri, Jul 29, 2011 at 10:50 AM, Alejandro Abdelnur <tucu@cloudera.com>wrote:

> Where are you getting the error, in the client submitting the job or in the
> MR tasks?
>
> Are you trying to access a file or trying to set a JAR in the
> DistributedCache?
> How/when are you adding the file/JAR to the DC?
> How are you retrieving the file/JAR from your outputformat code?
>
> Thxs.
>
> Alejandro
>
>
> On Fri, Jul 29, 2011 at 10:43 AM, Mapred Learn <mapred.learn@gmail.com>wrote:
>
>> I am trying to create a custom text outputformat where I want to access a
>> distirbuted cache file.
>>
>>
>>
>> On Fri, Jul 29, 2011 at 10:42 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>>> Mapred,
>>>
>>> By outputformat, do you mean the frontend, submit-time run of
>>> OutputFormat? Then no, it cannot access the distributed cache cause
>>> its not really setup at that point, and the front end doesn't need the
>>> distributed cache really when it can access those files directly.
>>>
>>> Could you describe slightly deeper on what you're attempting to do?
>>>
>>> On Fri, Jul 29, 2011 at 10:57 PM, Mapred Learn <mapred.learn@gmail.com>
>>> wrote:
>>> > Hi,
>>> > I am trying to access distributed cache in my custom output format but
>>> it
>>> > does not work and file open in custom output format fails with file
>>> does not
>>> > exist even though it physically does.
>>> >
>>> > Looks like distributed cache only works for Mappers and Reducers ?
>>> >
>>> > Is there a way I can read Distributed Cache in my custom output format
>>> ?
>>> >
>>> > Thanks,
>>> > -JJ
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Mime
View raw message