nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Rockwell <>
Subject Re: anchor url as well as text
Date Tue, 17 May 2005 23:40:49 GMT
Thanks, Sunhail.

I think the email below was not too clear that's why I sent the 
follow-up email. (I sent it at 9am but I didn't get a copy until almost 

It's not the hyperlinks that link to a given url that I want, it's the 
url of the page those hyperlinks are on. Doug said this is not possible 

But I will play with WebDBReader.getLinks and see what I get.

Again, thanks.


P.S. I am responding to this email now because I just got it at 

On May 17, 2005, at 9:46 AM, Suhail Ahmed wrote:

> Hi,
> Take a look at I shows you how to do what you want. 
> You should also look at if you want to get hold of the 
> out links from a page whilst Nutch is performing the parse on the 
> document.
> Suhail
> On May 17, 2005, at 4:46 AM, Lucas Rockwell wrote:
>> Hi all,
>> I am fairly new to nutch (but I have been wading through the code, 
>> docs and mailing lists) and I am wondering if there is a way to get 
>> the url of an anchor as well as the text of an anchor? I have a 
>> feeling there is, but I have not pulled things apart enough to really 
>> know for sure.
>> Any help would be much appreciated.
>> Thanks.
>> -lucas
>> p.s. nutch is a first-rate piece of software. Thanks to all who have 
>> labored over this amazing tool!

View raw message