hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: DistributedCache - why not read directly from HDFS?
Date Mon, 25 Mar 2013 22:30:34 GMT
More importantly, second and subsequent access of the file in DC is guaranteed to be local
disk i/o.

On Mar 24, 2013, at 3:00 AM, Alberto Cordioli wrote:

> Thanks for your reply Harsh.
> So if I want to read a simple text file, choosing whether to use
> DistributedCachce or HDFS it becomes just a matter of performance.
> 
> 
> Alberto
> 
> On 23 March 2013 16:17, Harsh J <harsh@cloudera.com> wrote:
>> A DistributedCache is not used just to distribute simple files but
>> also native libraries and such which cannot be loaded by certain if
>> its on HDFS.
>> 
>> Also, keeping it on HDFS could provide less performant as non-local
>> reads could happen (depending on the files' replication factor).
>> 
>> On Sat, Mar 23, 2013 at 8:23 PM, Alberto Cordioli
>> <cordioli.alberto@gmail.com> wrote:
>>> Hi all,
>>> 
>>> I was not able to find an answer to the following question. If the
>>> question has already been answered please give me the pointer to the
>>> right thread.
>>> 
>>> Which are actually the differences between read file from HDFS in one
>>> mapper and use DistributedCache.
>>> 
>>> I saw that with DistributedCache you can give an hdfs path and the
>>> task nodes will get the data on local file system. But which
>>> advantages we have compared with a simple HDFS read with
>>> FSDataInputStream.open() method?
>>> 
>>> Thank you very much,
>>> Alberto
>>> 
>>> 
>>> --
>>> Alberto Cordioli
>> 
>> 
>> 
>> --
>> Harsh J
> 
> 
> 
> -- 
> Alberto Cordioli

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
View raw message