nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Tanaman" <alan.tana...@idna-solutions.com>
Subject RE: RSS-fecter and index individul-how can i realize this function
Date Thu, 08 Feb 2007 12:54:42 GMT
>> 2. It sounds like a pretty fundamental API shift in Nutch, to support a
>> single type of content, RSS. Even if there are more content types that
>> follow this model, as Doug and Renaud both pointed out, there aren't a
>> multitude of them (perhaps archive files, but can you think of any
others)?

> Also true.  On the other hand, Nutch provides 98% of an RSS search 
> engine.  It'd be a shame to have to re-invent everything else and it 
> would be great if Nutch could evolve to support RSS well.
>
> Could image search might also benefit from this?  One could generate a 
> Parse for each image on a page whose text was from the page.  Product 
> search too, perhaps.

Another application could be splitting certain enterprise documents up,
either based on passage retrieval algorithms or simply based on the table of
content entries.  For example, a long contract or user guide could be split
up into separate searchable documents.

Best regards,
Alan
_________________________
Alan Tanaman
iDNA Solutions
http://blog.idna-solutions.com



Mime
View raw message