hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Costa <psdc1...@gmail.com>
Subject Re: Block size in HDFS
Date Fri, 10 Jun 2011 16:08:21 GMT
This means that, when HDFS reads 1KB file from the disk, he will put
the data in blocks of 64MB?

On Fri, Jun 10, 2011 at 5:00 PM, Philip Zeyliger <philip@cloudera.com> wrote:
> On Fri, Jun 10, 2011 at 8:42 AM, Pedro Costa <psdc1978@gmail.com> wrote:
>> But, how can I say that a 1KB file will only use 1KB of disc space, if
>> a block is configured has 64MB? In my view, if a 1KB use a block of
>> 64MB, the file will occupy 64MB in the disc.
>
> A block of HDFS is the unit of distribution and replication, not the
> unit of storage.  HDFS uses the underlying file systems for physical
> storage.
>
> -- Philip
>
>>
>> How can you disassociate a  64MB data block from HDFS of a disk block?
>>
>> On Fri, Jun 10, 2011 at 5:01 PM, Marcos Ortiz <mlortiz@uci.cu> wrote:
>>> On 06/10/2011 10:35 AM, Pedro Costa wrote:
>>>
>>> Hi,
>>>
>>> If I define HDFS to use blocks of 64 MB, and I store in HDFS a 1KB
>>> file, this file will ocupy 64MB in the HDFS?
>>>
>>> Thanks,
>>>
>>> HDFS is not very efficient storing small files, because each file is stored
>>> in a block (of 64 MB in your case), and the block metadata
>>> is held in memory by the NN. But you should know that this 1KB file only
>>> will use 1KB of disc space.
>>>
>>> For small files, you can use Hadoop archives.
>>> Regards
>>>
>>> --
>>> Marcos Luís Ortíz Valmaseda
>>>  Software Engineer (UCI)
>>>  http://marcosluis2186.posterous.com
>>>  http://twitter.com/marcosluis2186
>>>
>>>
>>
>



-- 
---------------------------
Pedro Sá da Costa

@: pcosta@lasige.di.fc.ul.pt
@: psdc1978@gmail.com

Mime
View raw message