hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sugandha Naolekar <sugandha....@gmail.com>
Subject Re: Can the file storage in HDFS be customized?
Date Wed, 26 Feb 2014 05:18:03 GMT
If there are two files of 129MB. In that case, 4 blocks will be created.
Thus, ideally, there should be 4 cores/machines (also called as mappers) to
process each of these blocks? Can a core consist of more than one blocks of
two different files?

--
Thanks & Regards,
Sugandha Naolekar





On Wed, Feb 26, 2014 at 10:04 AM, Sugandha Naolekar
<sugandha.n87@gmail.com>wrote:

> Yes. Got it. Thanks
>
> --
> Thanks & Regards,
> Sugandha Naolekar
>
>
>
>
>
> On Tue, Feb 25, 2014 at 10:14 PM, java8964 <java8964@hotmail.com> wrote:
>
>> Hi, Naolekar:
>>
>> The blocks in HDFS just store the bytes. It has no idea nor cares what
>> kind of data, or how many ploygons in this block. It just store 128M (if
>> your default block size is set to 128M) bytes.
>>
>> It is your InputFormat/RecordReader to read these bytes in, and
>> deserialize them to <K,V> pair.
>>
>> The default TextInputFormat will read one LINE of text for each reading.
>> Of course, the boundary of the block most likely will be in the middle of
>> line, so that is the TextInputFormat's responsibility to read correct whole
>> line of last record of one block, or find the correct starting point of the
>> fist line of current block, as you can image. You can read source code of
>> TextInputFormat to see how it implements it.
>>
>> After each line of Text read, it is the RecordReader class's
>> responsibility to translate that line of text into <K,V> pair.
>>
>> Is the above logic good for your data? Maybe not, then it is time to
>> write your owner InputFormat/RecordReader class to understand your own data.
>>
>> For InputFormat, read one record out from the block bytes array,
>> especially handle the block boundary cases, for both starting/ending of
>> block, as TextInputFormat does.
>>
>> For RecordRecorder, translate that record into <K,V> for your mapper.
>>
>> Yong
>>
>> ------------------------------
>> From: sugandha.n87@gmail.com
>> Date: Tue, 25 Feb 2014 15:59:33 +0530
>> Subject: Can the file storage in HDFS be customized?
>> To: user@hadoop.apache.org
>>
>>
>> Hello,
>>
>> I have a huge shapefile which has some 500 polygon  geometries. Is there
>> a way to store this shapefile in such a format in HDFS that each block will
>> have 100 polygon geometries. And each block representing a quad core
>> machine.
>>
>> Thus, 5 machines, with 5 blocks, which have in total 500 polygon
>> geometries.
>>
>> Internally, I would like to read each of the block of HDFS in such a way
>> where, each polygon geometry is fed to the map() task. THus, 100 map()
>> tasks per block per machine.
>>
>> --
>> Thanks & Regards,
>> Sugandha Naolekar
>>
>>
>>
>>
>

Mime
View raw message