hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuri K." <mr_greens...@hotmail.com>
Subject Manually splitting files in blocks
Date Wed, 24 Mar 2010 15:23:58 GMT

Dear Hadoopers,

i'm trying to find out how and where hadoop splits a file into blocks and
decides to send them to the datanodes.

My specific problem:
i have two types of data files.
One large file is used as a database-file where information is sorted like
this:
[BEGIN DATAROW]
... lots of data 1
[END DATAROW]

[BEGIN DATAROW]
... lots of data 2
[END DATAROW]
and so on.

and the other smaller files contain raw data and are to be compared to a
datarow in the large file.

so my question is: is it possible to manually set how hadoop splits the
large data file into blocks?
obviously i want the begin-end section to be in one block to optimize
performance. thus i can replicate the smaller files on each node and so
those can work independently from the other.

thanks, yk
-- 
View this message in context: http://old.nabble.com/Manually-splitting-files-in-blocks-tp28015936p28015936.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message