hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bonito perdo <bonito.pe...@googlemail.com>
Subject Re: local directory
Date Wed, 01 Jul 2009 19:36:45 GMT
I tried to store data in the local directory of each node inside the close()
function of mapper. Particularly, I want to serialize an object and store it
in a file (permanently) in the local disk of each node that currently
executes the map phase.

I Use this code:
FileSystem fs = null;
FSDataOutputStream out ;
ObjectOutputStream obj;
Path localOutPath;


   in the configure( ) function of the mapper:
             localOutPath = new Path( conf.get("mapred.local.dir"));

            fs = localOutPath.getFileSystem(conf);
            out = fs.create(localOutPath);
            obj = new ObjectOutputStream(out);

and in the close() function of the mapper:
        obj.writeObject(someObject);
        obj.close();


Hoever, after checking the mapred.local.dir nothing is stored there. Having
read that after each succesful rask this directory is deleted, I think that
this might be the reason.
Nonetheless, I really want to find a way to make each task able of writing
local data to the local filesystem rather than to the hdfs.

Thank you.





On Wed, Jul 1, 2009 at 5:30 PM, bonito perdo <bonito.perdo@googlemail.com>wrote:

> Thank you Jason!
>
>
> On Wed, Jul 1, 2009 at 5:26 PM, jason hadoop <jason.hadoop@gmail.com>wrote:
>
>> The directory returned by getWorkOutputPath is a task specific directory,
>> to
>> be used for files that should be part of the final output of the job.
>>
>> If you want to write to the task local directory, use the local file
>> system
>> api, and paths relative to '.'.
>> The parameter mapred.local.dir will contain the name of the local
>> directory.
>>
>>
>> On Wed, Jul 1, 2009 at 9:19 AM, bonito perdo <bonito.perdo@googlemail.com
>> >wrote:
>>
>> > Thank you for you immediate response.
>> > In this case, what is the difference with the path obtained from
>> > FileOutputFormat.getWorkOutputPath(job)? this path refers to hdfs...
>> >
>> > Thank you.
>> >
>> >
>> > On Wed, Jul 1, 2009 at 5:13 PM, jason hadoop <jason.hadoop@gmail.com>
>> > wrote:
>> >
>> > > The parameter mapred.local.dir controls the directory used by the task
>> > > tracker for map/reduce jobs local files.
>> > >
>> > > the dfs.data.dir paramter is for the datanode.
>> > >
>> > > On Wed, Jul 1, 2009 at 8:56 AM, bonito <bonito.perdo@gmail.com>
>> wrote:
>> > >
>> > > >
>> > > > Hello,
>> > > > I am a bit confused about the local directories where each
>> map/reduce
>> > > task
>> > > > can store data.
>> > > > According to what I have read,
>> > > > dfs.data.dir - is the path on the local file system in which the
>> > DataNode
>> > > > instance should store its data. That is, since we have a number of
>> > > > individual nodes, this is the place where each node can store its
>> own
>> > > data.
>> > > > Right?
>> > > > This data may be part of a-let's say- file stored under the hdfs
>> > > namespace?
>> > > > The value of this property for my configuration is:
>> > > >                          /home/bon/my_hdfiles/temp_0.19.1/dfs/data.
>> > > > As far as I can understand this path refers to the local "disk" of
>> each
>> > > > node.
>> > > >
>> > > > Moreover, calling FileOutputFormat.getWorkOutputPath(job) we obtain
>> the
>> > > > Path
>> > > > to the task's temporary output directory for the map-reduce job.
>> This
>> > > path
>> > > > is totally different than the previous which confuses me since the
>> > > > temporary
>> > > > output of each task should be written locally in the node's disk.
>> The
>> > > path
>> > > > I
>> > > > retrieve is:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> hdfs://localhost:9000/user/bon/keys_fil.txt/_temporary/_attempt_200907011515_0009_m_000000_0
>> > > > Does this path refer to the local disk (node)? Or is it possible
>> that
>> > it
>> > > > may
>> > > > refer to another node in the cluster?
>> > > >
>> > > > Any clarification would be of great help.
>> > > >
>> > > > Thank you.
>> > > > --
>> > > > View this message in context:
>> > > > http://www.nabble.com/local-directory-tp24292289p24292289.html
>> > > > Sent from the Hadoop core-user mailing list archive at Nabble.com.
>> > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> > > www.prohadoopbook.com a community for Hadoop Professionals
>> > >
>> >
>>
>>
>>
>> --
>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> www.prohadoopbook.com a community for Hadoop Professionals
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message