hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Angeles <patr...@cloudera.com>
Subject Re: Manually splitting files in blocks
Date Wed, 24 Mar 2010 15:38:35 GMT
Yuri,

Probably the easiest thing is to actually create distinct files and
configure the block size per file such that HDFS doesn't split it into
smaller blocks for you.

- P

On Wed, Mar 24, 2010 at 11:23 AM, Yuri K. <mr_greenshit@hotmail.com> wrote:

>
> Dear Hadoopers,
>
> i'm trying to find out how and where hadoop splits a file into blocks and
> decides to send them to the datanodes.
>
> My specific problem:
> i have two types of data files.
> One large file is used as a database-file where information is sorted like
> this:
> [BEGIN DATAROW]
> ... lots of data 1
> [END DATAROW]
>
> [BEGIN DATAROW]
> ... lots of data 2
> [END DATAROW]
> and so on.
>
> and the other smaller files contain raw data and are to be compared to a
> datarow in the large file.
>
> so my question is: is it possible to manually set how hadoop splits the
> large data file into blocks?
> obviously i want the begin-end section to be in one block to optimize
> performance. thus i can replicate the smaller files on each node and so
> those can work independently from the other.
>
> thanks, yk
> --
> View this message in context:
> http://old.nabble.com/Manually-splitting-files-in-blocks-tp28015936p28015936.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message