hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil McCarthy <philmccar...@gmail.com>
Subject Parallelizing HTTP calls with MapReduce
Date Sat, 06 Mar 2010 17:29:09 GMT

I'm new to Hadoop, and I'm trying to figure out the best way to use it
with EC2 to make large number of calls to a web API, and then process
and store the results. I'm completely new to Hadoop, so I'm wondering
what's the best high-level approach, in terms of using MapReduce to
parallelize the process. The calls will be regular HTTP requests, and
the URLs follow a known format, so can be generated easily.

This seems like it'd be a pretty common type of task, so apologies if
I've missed something obvious in the docs etc.

Phil McCarthy

View raw message