hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rishi pathak <mailmaverick...@gmail.com>
Subject Nutch hadoop and Torque integration
Date Fri, 14 Jan 2011 05:59:00 GMT
           Sorry for cross posting. We have a compute cluster running Torque
resource manager and Maui scheduler.
Compute cluster is almost full but at times (early mornings, late night,
holidays), resources are available in pockets( 2-10 nodes for 2-5 hrs).
Our idea is setup nutch(hadoop) in a way to utilize these pockets i.e. an
automated  system wherein a long crawling job is broken down in to smaller
map/red jobs
. The system would be constantly monitoring the availability of resources
and would request, execute and finalize these smaller tasks using resource
manager interface. We had a look at HOD but to the extent of my knowledge
about it, it does not serve the purpose.

In a way it is too much to ask and may be a complete solution is not
available but any pointers/links are more than welcomed.

We are also looking at JobStream.py available at


Rishi Pathak
National PARAM Supercomputing Facility

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message