hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul patodi <patodira...@gmail.com>
Subject Re: how does hdfs read archived files
Date Thu, 25 Nov 2010 05:12:33 GMT
hi Jason,
I think it maintains starting index of the file with in the block, so there
is no need to read whole block every time we want to read small file, also
as we know due to performance issue it reserve 64 MB block for any size of
file, now again due to same reason it maintain indexes and just read
starting point of the file.

-Thanks and Regards,
Rahul Patodi
Associate Software Engineer,
Impetus Infotech (India) Private Limited,
www.impetus.com
Mob:09907074413


On Wed, Nov 24, 2010 at 11:49 PM, Harsh J <qwertymaniac@gmail.com> wrote:

> I think HARs maintain indexes of all the file boundaries in the blocks
> created, and therefore it would "seek" to the beginning point within
> the block to begin reading a particular file. So it does not exactly
> "read" the entire block to retrieve that file.
>
> On Wed, Nov 24, 2010 at 11:22 PM, Jason Ji <jason_jice@yahoo.com> wrote:
> > hi guys,
> >
> > We plan to use hadoop hdfs  as the storage to store lots of  little
> files.
> >
> > According to the document , it is recommended to use hadoop
> >
> > Archive to compress those little files to get better performance .
> >
> >
> >
> > Our question is that since hdfs is reading the entire say 64m  block
> every
> > time,
> >
> > Does it mean that everytime when we are just trying to retrieve a single
> > file
> >
> > Inside the archive, hdfs will still read the whole block as well ?
> >
> > If no, what’s the actual behavior ? anyway we can verify it ?
> >
> >
> >
> > Thanks in advance.
> >
> > Jason
> >
> >
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>



--

Mime
View raw message