hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZhiHong Fu" <ddrea...@gmail.com>
Subject Re: help: InputFormat problem ?
Date Wed, 29 Oct 2008 02:20:54 GMT
I'm a  little confused about the implemention of DBInputFormat. In my view ,
The method getSplits of DBInputFormat  splits the resultset into serval
splits logically. so The DbRecordReader should process the DbSplit. But I
find in the real implement of DbRecordReader  It process the resultset
instead of  the DbSplit. I don't understand why. Thanks for any help.

2008/10/27 ZhiHong Fu <ddream84@gmail.com>

> 2008/10/27 Owen O'Malley <omalley@apache.org>
>> If your application that you are drawing from is doing some sort of web
>> crawl that connects to lots of random servers, you may want to use
>> MultiThreadedMapRunner and do the remote connections in the map. If you are
>> just connecting to a small set of servers for each map, you should put it in
>> the InputFormat.
>> Using MultiThreadedMapRunner means that each map can have multiple threads
>> transferring data from the external sources.
>> -- Owen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message