lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: setMaxClauseCount ??
Date Wed, 21 Jan 2004 10:47:39 GMT
Karl Koch wrote:

> Hi Doug,
> 
> thank you for the answer so far.
> 
> I actually wanted to add a large amount of text from an existing document to
> find a close related one. Can you suggest another good way of doing this. A
> direct match will not occur anyway. How can I make a most Vector Space Model
> (VSM) like query (each word a dimension value - find documents close to
> that)? You know as good as I that the standard VSM does not have any Boolean logic
> inside... how do I need to formuate the query to make it as much similar to
> a vector in order to find similar document in the vector space of the Lucene
> index?

You should try to reduce the dimensionality by reducing the number of 
unique features. In this case, you could for example use only keywords 
(or key phrases) instead of the full content of documents.

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message