hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Patterson <j...@cloudera.com>
Subject Re: Block size in HDFS
Date Fri, 10 Jun 2011 19:34:41 GMT
It will only take up ~1KB of local datanode disk space (+ metadata
space such as the CRC32 of every 512 bytes, along with replication @
1KB per replicated block, in this case 2KB) but the real cost is a
block entry in the Namenode --- all block data at the namenode lives
in memory, which is a much more scare resource for the cluster in a
relative sense.

On Fri, Jun 10, 2011 at 11:47 AM, Pedro Costa <psdc1978@gmail.com> wrote:
> So, I'm not getting how a 1KB file can cost a block of 64MB. Can
> anyone explain me?
>
> On Fri, Jun 10, 2011 at 5:13 PM, Philip Zeyliger <philip@cloudera.com> wrote:
>> On Fri, Jun 10, 2011 at 9:08 AM, Pedro Costa <psdc1978@gmail.com> wrote:
>>> This means that, when HDFS reads 1KB file from the disk, he will put
>>> the data in blocks of 64MB?
>>
>> No.
>>
>>>
>>> On Fri, Jun 10, 2011 at 5:00 PM, Philip Zeyliger <philip@cloudera.com>
wrote:
>>>> On Fri, Jun 10, 2011 at 8:42 AM, Pedro Costa <psdc1978@gmail.com> wrote:
>>>>> But, how can I say that a 1KB file will only use 1KB of disc space, if
>>>>> a block is configured has 64MB? In my view, if a 1KB use a block of
>>>>> 64MB, the file will occupy 64MB in the disc.
>>>>
>>>> A block of HDFS is the unit of distribution and replication, not the
>>>> unit of storage.  HDFS uses the underlying file systems for physical
>>>> storage.
>>>>
>>>> -- Philip
>>>>
>>>>>
>>>>> How can you disassociate a  64MB data block from HDFS of a disk block?
>>>>>
>>>>> On Fri, Jun 10, 2011 at 5:01 PM, Marcos Ortiz <mlortiz@uci.cu>
wrote:
>>>>>> On 06/10/2011 10:35 AM, Pedro Costa wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> If I define HDFS to use blocks of 64 MB, and I store in HDFS a 1KB
>>>>>> file, this file will ocupy 64MB in the HDFS?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> HDFS is not very efficient storing small files, because each file
is stored
>>>>>> in a block (of 64 MB in your case), and the block metadata
>>>>>> is held in memory by the NN. But you should know that this 1KB file
only
>>>>>> will use 1KB of disc space.
>>>>>>
>>>>>> For small files, you can use Hadoop archives.
>>>>>> Regards
>>>>>>
>>>>>> --
>>>>>> Marcos Luís Ortíz Valmaseda
>>>>>>  Software Engineer (UCI)
>>>>>>  http://marcosluis2186.posterous.com
>>>>>>  http://twitter.com/marcosluis2186
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv

Mime
View raw message