hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burhan Uddin <burhan...@gmail.com>
Subject Re: Need help on accessing datanodes local filesystem using hadoop map reduce framework
Date Wed, 27 Oct 2010 15:46:18 GMT
again asking, is it some how possible to acces data nodes local file system
using hdfs. incase i need for anything.

On Wed, Oct 27, 2010 at 9:40 PM, Burhan Uddin <burhan.bd@gmail.com> wrote:

> Thanks matt for your reply.
>
> Well if it is so does it mean it will distribute the computing process
> accordingly. i m talking about the internal mechanism. will it make all the
> processing on the same datanode the data belongs to ?
>
>
>
>
>
>
> On Wed, Oct 27, 2010 at 9:19 PM, Matt Pouttu-Clarke <
> Matt.Pouttu-Clarke@icrossing.com> wrote:
>
>> Hi Burdan,
>>
>> Really you should not be concerned which data nodes store the data and
>> what
>> is on which data node.  HDFS takes care of storing the data on data nodes,
>> kind of like Unix file systems take care of storing data on disk.
>>
>> On Unix we store files in directories but do not care what disk blocks the
>> data is stored on.
>>
>> On HDFS we also store files in directories but do not care which data
>> nodes
>> they are stored on.
>>
>> Really if you create a directory for each site then that should be enough
>> for what you are talking about doing.
>>
>> Cheers,
>> Matt
>>
>> On 10/22/10 1:14 PM, "Burhan Uddin" <burhan.bd@gmail.com> wrote:
>>
>> > Hello,
>> > I am a beginner with hadoop framework. I am trying create a distributed
>> > crawling application. I have googled a lot. but the resources are too
>> low.
>> > Can anyone please help me on the following topics.
>> >
>> > 1. I want to access the local file system of datanode. Suppose i have
>> > crawled to site a and b. is it somehow possible using hadoop api to
>> control
>> > which datanode will be used to store it. like i want to store site a on
>> > datanode 1 and site b on datanode 2 or just the way i wish. is it some
>> how
>> > possible?
>> >
>> > 2. when i will create map reduce for lucene indexing, if map process on
>> > datanode 1 requires data from data node 2 will all of it come through
>> master
>> > node?? since i need to access them with hdfs://master:port. does it mean
>> it
>> > will exchange all data through master node?
>> >
>> > 3. how can i make it sure that a map process (like lucene indexing on
>> > crawled data) is running right on the  data node that contains the data.
>> > (may be i could not explain it well. its like i really dont want to
>> datanode
>> > 2 ( storing site b) is indexing site a (which is stored on datanode 1),
>> so
>> > that it consumes up a lot of network traffic)
>> >
>> >
>> > Please anyone reply me as early as possible.
>> >
>> > Thanks
>> > Burhan Uddin
>> >
>> > Student
>> > Department of Computer Science & Engineering
>> > Shahjalal University of Science & Technology
>> > Bangladesh
>>
>>
>> iCrossing Privileged and Confidential Information
>> This email message is for the sole use of the intended recipient(s) and
>> may contain confidential and privileged information of iCrossing. Any
>> unauthorized review, use, disclosure or distribution is prohibited. If you
>> are not the intended recipient, please contact the sender by reply email and
>> destroy all copies of the original message.
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message