hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaron Gonen <yaron.go...@gmail.com>
Subject Re: InputFormat for some REST api
Date Tue, 19 Feb 2013 19:28:39 GMT
Thanks, and excellent points.
I just wanted to know if someone is working this way and if it is a common

On Tue, Feb 19, 2013 at 7:39 PM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Good points sir. Specially the second one. How the splits will get
> generated?
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
> On Tue, Feb 19, 2013 at 11:04 PM, Robert Evans <evans@yahoo-inc.com>wrote:
>> I don't know of any input format that will do this out of the box.  But
>> it should not be that hard to write one.  There are two big issues here.
>>    1. the data you are reading form the API really needs to be static,
>>    or you could get some very odd inconsistencies. For example a node dies
>>    after a map task has finished and not all of the reducers got the data, so
>>    the map task is rerun and some of the reducers have some old data, and some
>>    of the reducers have new data.  This is the main reason to download the
>>    data before processing it.  You can work around this by using the input
>>    format to run a map only job that then writes the data out to a file before
>>    processing it the rest of the way.
>>    2. You need a good way to partition the data from the API.  This can
>>    be difficult unless the REST API provides a logical way to split this up.
>> --Bobby
>> From: Yaron Gonen <yaron.gonen@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> Date: Tuesday, February 19, 2013 4:49 AM
>> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> Subject: InputFormat for some REST api
>> Hi,
>> Do you know of any InputFormat implemented for some REST api provider?
>> Usually when one needs to process data that is accessible only by REST,
>> one should try to download the data first someone, but what if you cannot
>> download it?
>> thanks

View raw message