hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Cordioli <cordioli.albe...@gmail.com>
Subject Re: DistributedCache - why not read directly from HDFS?
Date Sun, 24 Mar 2013 10:00:09 GMT
Thanks for your reply Harsh.
So if I want to read a simple text file, choosing whether to use
DistributedCachce or HDFS it becomes just a matter of performance.


Alberto

On 23 March 2013 16:17, Harsh J <harsh@cloudera.com> wrote:
> A DistributedCache is not used just to distribute simple files but
> also native libraries and such which cannot be loaded by certain if
> its on HDFS.
>
> Also, keeping it on HDFS could provide less performant as non-local
> reads could happen (depending on the files' replication factor).
>
> On Sat, Mar 23, 2013 at 8:23 PM, Alberto Cordioli
> <cordioli.alberto@gmail.com> wrote:
>> Hi all,
>>
>> I was not able to find an answer to the following question. If the
>> question has already been answered please give me the pointer to the
>> right thread.
>>
>> Which are actually the differences between read file from HDFS in one
>> mapper and use DistributedCache.
>>
>> I saw that with DistributedCache you can give an hdfs path and the
>> task nodes will get the data on local file system. But which
>> advantages we have compared with a simple HDFS read with
>> FSDataInputStream.open() method?
>>
>> Thank you very much,
>> Alberto
>>
>>
>> --
>> Alberto Cordioli
>
>
>
> --
> Harsh J



-- 
Alberto Cordioli

Mime
View raw message