hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <...@yahoo-inc.com>
Subject Re: Input Data from DB or Memory rather than HDFS
Date Tue, 03 Jun 2008 17:23:00 GMT

On Jun 3, 2008, at 4:56 AM, smallufo wrote:

> What if my data come from DB or memory ?
> I should implement a DatabaseInputFormat implements InputFormat<int  
> rowIndex
> , MyData value> , right ?

Yes

> But , how to implement the getSplits() , and getRecordReader() ?
> I looks into the sample source code for a long time , but still  
> don't know
> how to "split" the data.

For most tables, I would choose key ranges for the splits. For  
example, if your primary key was name, choose split points that  
divide the table into roughly equal parts.

name < 'b' -> mapper 0
'b' <= name < 'c'  -> mapper 1

or whatever makes sense for your data.

>
> Is there any example code demonstrating data not come from DB or  
> objects in
> memory ?

Take a look at the hbase table splitter:

http://tinyurl.com/48s76f

-- Owen
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message