nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philipp Suter <p.su...@netbreeze.ch>
Subject Re: [nutch 0.5] frames
Date Fri, 08 Jul 2005 08:53:40 GMT
Andrzej Bialecki wrote:

> Philipp Suter wrote:
>
>> does anybody know how to crawl frames? Or how to extend nutch to be 
>> able to crawl frames? We are using the api.
>
>
> The development version (available from SVN) should handle frames just 
> fine, i.e. it should follow the src=... attributed in frames in order 
> to retrieve the frame contents. Please download the nightly snapshot 
> and try it out.
>
>
When do you think will it be released officially? we have some mision 
critical stuff running with nutch, therefore I don't know if the nightly 
snapshot is working for us but I'll try it out.

Have you ever thought about integrating a javascript interpreter into 
nutch? this could be another big step thowards a wider range of 
crawlable websites. If you need any help on this I would be very much 
interested to support anybody (timewise) implementing such a functionality.

Have you evaluated flash either? is it possible to parse it?

cheers
ph

Mime
View raw message