lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: setMaxClauseCount ??
Date Wed, 21 Jan 2004 16:15:27 GMT
Karl:
http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=114748

Status: several people have mentioned they wanted to work on it, but
nobody has contributed any patches.  The code you see at the above URL
is not compatible with Lucene 1.3, but could be brought up to date.

Otis

--- Karl Koch <TheRanger@gmx.net> wrote:
> Hello Doug,
> 
> that sounds interesting to me. I refer to a paper written by NIST
> about
> Relevance Feedback which was doing test with 20 - 200 words. This is
> why I
> thought it might be good to be able to use all non stopwords of a
> document for that
> and see what is happening. Do you know good papers about strategies
> of how
> to select keywords effectivly beyond the scope of stopword lists and
> stemming?
> 
> Using term frequencies of the document is not really possible since
> lucene
> is not providing access to a document vector, isn't it?
> 
> By the way, could you send me the code of Dmitry about the Vector
> extension.
> I have been asking in another thread but I did not get it so far. I
> really
> would like to have a look... Also it would be nice to know about any
> status
> regarding the progress of integrating it in Lucene 1.3. Who is
> working on it
> and how could I contribute?
> 
> Cheers,
> Karl
> 
> 
> > Andrzej Bialecki wrote:
> > > Karl Koch wrote:
> > >> I actually wanted to add a large amount of text from an existing
> 
> > >> document to
> > >> find a close related one. Can you suggest another good way of
> doing 
> > >> this.
> > >
> > > You should try to reduce the dimensionality by reducing the
> number of 
> > > unique features. In this case, you could for example use only
> keywords 
> > > (or key phrases) instead of the full content of documents.
> > 
> > Indeed, this is a good approach.  In my experience, six or eight
> terms 
> > are usually enough, and they needn't all be required.
> > 
> > Doug
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> 
> -- 
> +++ GMX - die erste Adresse für Mail, Message, More +++
> Bis 31.1.: TopMail + Digicam für nur 29 EUR
> http://www.gmx.net/topmail
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message