lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject SIPs and CAPs
Date Thu, 14 Jul 2005 10:45:17 GMT
Has anyone developed code to extract SIPs (statistically improbable  
phrases) and CAPs (capitalized phrases) from a Lucene index, such as  
Amazon does with it's books as shown here?


I'm curious as it is something I'd like to do with some of my work.   
Of course CAPs would be impossible to extract from an index that used  
a lowercasing analyzer, so that is a special case that would require  
work during indexing.  But SIPs could be extracted from an existing  


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message