hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 柳松 <lamfeel...@126.com>
Subject [AD]A www crawler for Hadoop Application (Pagerank Computing Included)
Date Sat, 15 Aug 2009 05:18:23 GMT
Dear all:
	These days, I have wrote an hadoop www crawler for my project, and I
hope this little open source program may help you in work.
	I originally decided to use the Nutch, however, soon I found it was
so complicated to modify and develop my own work on it.
	So I studied the basic mechanism of Nutch Crawler, and rewrite some
core components with my own code, which is called project Joycrawler.

	I aim to build a functional and convenient tool for some web data
mining jobs and links analysis hadoop programs, so the whole project
Joycrawler is a standard program and can be simply reused in other hadoop

	It is compatible with hadoop 0.20.0( both Apache and Yahoo's
distribution), and include a Pagerank computing program, which is also a
standard hadoop program.

	I hope following information is helpful for you.

	Project home: http://code.google.com/p/joycrawler/
	Or try it now:
	For manual: http://joycrawler.googlecode.com/files/Readme-0.11.1.pdf

	Contact:	lamfeeling@126.com

Best Regards
Song Liu in Soochow University

View raw message