lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MOYSE Gilles (Cetelem)" <gilles.mo...@cetelem.fr>
Subject Compound expression extraction
Date Tue, 21 Oct 2003 08:00:41 GMT
Hi.

I'm trying to extract expressions from the terms position information, i.e.,
if two words appears frequently side-by-side, then we can consider that the
two words are only one. For instance, 'Object' and 'Oriented' appears
side-by-side 9 times out of 10. It allows us to define a new expression,
'Object_Oriented'.
Does anyone knows the statistical method to detect such expressions ?

Thanks.

Gilles Moyse

-----Message d'origine-----
De : Eric Jain [mailto:Eric.Jain@isb-sib.ch]
Envoyé : mardi 21 octobre 2003 09:24
À : Lucene Users List
Objet : Re: Lucene on Windows


> The CVS version of Lucene has a patch that allows one to use a
> 'Compound Index' instead of the traditional one.  This reduces the
> number of open files.  For more info, see/make the Javadocs for
> IndexWriter.

Interesting option. Do you have a rough idea of what the performance
impact of using this setting is?

--
Eric Jain


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message