nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suhail Ahmed <ilya...@mac.com>
Subject Re: anchor url as well as text
Date Tue, 17 May 2005 16:46:17 GMT
Hi,

Take a look at WebDBReader.java. I shows you how to do what you want.  
You should also look at HtmlParser.java if you want to get hold of  
the out links from a page whilst Nutch is performing the parse on the  
document.

Suhail


On May 17, 2005, at 4:46 AM, Lucas Rockwell wrote:

> Hi all,
>
> I am fairly new to nutch (but I have been wading through the code,  
> docs and mailing lists) and I am wondering if there is a way to get  
> the url of an anchor as well as the text of an anchor? I have a  
> feeling there is, but I have not pulled things apart enough to  
> really know for sure.
>
> Any help would be much appreciated.
>
> Thanks.
>
> -lucas
>
> p.s. nutch is a first-rate piece of software. Thanks to all who  
> have labored over this amazing tool!
>
>


Mime
View raw message