hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shimi K" <shimi....@gmail.com>
Subject Re: Caching frequently map input files
Date Mon, 11 Feb 2008 06:05:30 GMT
I choose Hadoop more for the distributed calculation then the support for
huge files and my files do fit into memory.
I have a lot of small files and my system needs to search for something in
those files very fast. I figured I can distribute the files on a Hadoop
cluster and then uses the distributed calculation to do the search in
parallel on many files as possible. This way I would be able to return a
result faster then if I would have used one machine.

Is there a way to tell which files are in memory?

On Feb 10, 2008 10:33 PM, Ted Dunning <tdunning@veoh.com> wrote:

> But if your files DO fit into memory then the datanodes that have copies
> of
> the blocks of your file will probably still have them in memory and since
> maps are typically data local, you will benefit as much as possible.
> On 2/10/08 11:17 AM, "Arun C Murthy" <acm@yahoo-inc.com> wrote:
> >> Is Hadoop cache frequently/LRU/MRU map input files? Or does it
> >> upload files
> >> from the disk each time a file is needed no matter if it was the
> >> same file
> >> that was required by the last job on the same node?
> >>
> >
> > There is no concept of caching input files across jobs.
> >
> > Hadoop is geared towards dealing with _huge_ amounts of data which
> > don't fit into memory anyway... and hence doing it across jobs is moot.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message