nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: anchor url as well as text
Date Tue, 17 May 2005 16:24:44 GMT
Lucas Rockwell wrote:
> I am fairly new to nutch (but I have been wading through the code, docs 
> and mailing lists) and I am wondering if there is a way to get the url 
> of an anchor as well as the text of an anchor? I have a feeling there 
> is, but I have not pulled things apart enough to really know for sure.

At present this is not supported.  It could be easily added, but would 
substantially slow things.  With a little more work it could be made 
somewhat efficient.  I will attempt to include this feature in the 
MapReduce rewrite that I'm now starting.  So, one way or another, this 
feature will be added.


View raw message