hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dino Kečo <dino.k...@gmail.com>
Subject Re: Hadoop--store a sequence file in distributed cache?
Date Fri, 12 Aug 2011 08:30:38 GMT
Hi Sofia,

I assume that output of first job is stored on HDFS. In that case I would
directly read file from Mappers without using distributed cache. If you put
file into distributed cache that would add one more copy operation into your
process.

Thanks,
dino


On Fri, Aug 12, 2011 at 9:53 AM, Sofia Georgiakaki
<geosofie_tuc@yahoo.com>wrote:

> Good morning,
>
> I would like to store some files in the distributed cache, in order to be
> opened and read from the mappers.
> The files are produced by an other Job and are sequence files.
> I am not sure if that format is proper for the distributed cache, as the
> files in distr.cache are stored and read locally. Should I change the format
> of the files in the previous Job and make them Text Files maybe and read
> them from the Distr.Cache using tha simple Java API?
> Or can I still handle them with the usual way we use sequence files, even
> if they reside in the local directory? Performance is extremely important
> for my project, so I don't know what the best solution would be.
>
> Thank you in advance,
> Sofia Georgiakaki

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message