hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: Manually splitting files in blocks
Date Wed, 24 Mar 2010 16:17:09 GMT
Hi Yuri,

You can also check the source code of FileInputFormat and create your own
RecordReader implementation.
Thanks and Regards,
Sonal
www.meghsoft.com


On Wed, Mar 24, 2010 at 9:08 PM, Patrick Angeles <patrick@cloudera.com>wrote:

> Yuri,
>
> Probably the easiest thing is to actually create distinct files and
> configure the block size per file such that HDFS doesn't split it into
> smaller blocks for you.
>
> - P
>
> On Wed, Mar 24, 2010 at 11:23 AM, Yuri K. <mr_greenshit@hotmail.com>
> wrote:
>
> >
> > Dear Hadoopers,
> >
> > i'm trying to find out how and where hadoop splits a file into blocks and
> > decides to send them to the datanodes.
> >
> > My specific problem:
> > i have two types of data files.
> > One large file is used as a database-file where information is sorted
> like
> > this:
> > [BEGIN DATAROW]
> > ... lots of data 1
> > [END DATAROW]
> >
> > [BEGIN DATAROW]
> > ... lots of data 2
> > [END DATAROW]
> > and so on.
> >
> > and the other smaller files contain raw data and are to be compared to a
> > datarow in the large file.
> >
> > so my question is: is it possible to manually set how hadoop splits the
> > large data file into blocks?
> > obviously i want the begin-end section to be in one block to optimize
> > performance. thus i can replicate the smaller files on each node and so
> > those can work independently from the other.
> >
> > thanks, yk
> > --
> > View this message in context:
> >
> http://old.nabble.com/Manually-splitting-files-in-blocks-tp28015936p28015936.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message