nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Rockwell <luc...@tsw.berkeley.edu>
Subject Re: anchor url as well as text
Date Tue, 17 May 2005 23:40:49 GMT
Thanks, Sunhail.

I think the email below was not too clear that's why I sent the 
follow-up email. (I sent it at 9am but I didn't get a copy until almost 
4pm...)

It's not the hyperlinks that link to a given url that I want, it's the 
url of the page those hyperlinks are on. Doug said this is not possible 
yet...

But I will play with WebDBReader.getLinks and see what I get.

Again, thanks.

-lucas

P.S. I am responding to this email now because I just got it at 
4:10pm...

On May 17, 2005, at 9:46 AM, Suhail Ahmed wrote:

> Hi,
>
> Take a look at WebDBReader.java. I shows you how to do what you want. 
> You should also look at HtmlParser.java if you want to get hold of the 
> out links from a page whilst Nutch is performing the parse on the 
> document.
>
> Suhail
>
>
> On May 17, 2005, at 4:46 AM, Lucas Rockwell wrote:
>
>> Hi all,
>>
>> I am fairly new to nutch (but I have been wading through the code, 
>> docs and mailing lists) and I am wondering if there is a way to get 
>> the url of an anchor as well as the text of an anchor? I have a 
>> feeling there is, but I have not pulled things apart enough to really 
>> know for sure.
>>
>> Any help would be much appreciated.
>>
>> Thanks.
>>
>> -lucas
>>
>> p.s. nutch is a first-rate piece of software. Thanks to all who have 
>> labored over this amazing tool!
>>
>>
>


Mime
View raw message