nutch-agent mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Todd Slack-Moehrle <mailingli...@MailNewsRSS.com>
Subject Nutch Crawling Questions
Date Sun, 19 Apr 2009 23:24:23 GMT
Hi All,

I have some starting Nutch questions that I am hoping to gain insight  
about.

I want to start at Dmoz.org and follow links for entertainment (like  
concerts, art gallery events, etc) and examine the link to see if I  
should get data back about it and from it.

My questions:

1. Can Nutch start at a given URL and examine every link (based upon  
my criteria)? (obviously I can write Case or If/Else or While to do  
this)

2. If I find a link that has certain keywords that I find of interest,  
can I hit that link of interest and get information from that page?

3. How do I get the information about the link of interest and its  
content of interest into a MySQL database? (I know ColdFusion and  
MySQL and PHP). I think what I am asking is how do I get back to my  
database from a crawler?

4. As I know Nutch is Java, which is fine, I will need Tomcat running  
etc. Are there other java App Servers out there as well for OS X?

5. Does anyone have deployment instructions for OS X?

Am I making any sense?

-Jason
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message