hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: how does hdfs read archived files
Date Wed, 24 Nov 2010 18:19:42 GMT
I think HARs maintain indexes of all the file boundaries in the blocks
created, and therefore it would "seek" to the beginning point within
the block to begin reading a particular file. So it does not exactly
"read" the entire block to retrieve that file.

On Wed, Nov 24, 2010 at 11:22 PM, Jason Ji <jason_jice@yahoo.com> wrote:
> hi guys,
>
> We plan to use hadoop hdfs  as the storage to store lots of  little files.
>
> According to the document , it is recommended to use hadoop
>
> Archive to compress those little files to get better performance .
>
>
>
> Our question is that since hdfs is reading the entire say 64m  block every
> time,
>
> Does it mean that everytime when we are just trying to retrieve a single
> file
>
> Inside the archive, hdfs will still read the whole block as well ?
>
> If no, what’s the actual behavior ? anyway we can verify it ?
>
>
>
> Thanks in advance.
>
> Jason
>
>
>



-- 
Harsh J
www.harshj.com

Mime
View raw message