lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject SIPs and CAPs
Date Thu, 14 Jul 2005 10:45:17 GMT
Has anyone developed code to extract SIPs (statistically improbable  
phrases) and CAPs (capitalized phrases) from a Lucene index, such as  
Amazon does with it's books as shown here?

     <http://www.amazon.com/exec/obidos/tg/detail/-/0764526413/ 
ref=sip_top_dp/102-8573693-0514548?%5Fencoding=UTF8&v=glance>

I'm curious as it is something I'd like to do with some of my work.   
Of course CAPs would be impossible to extract from an index that used  
a lowercasing analyzer, so that is a special case that would require  
work during indexing.  But SIPs could be extracted from an existing  
index.

Thanks,
     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message