lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Braun <mbr...@uni-hd.de>
Subject Re: Problem finding similar documents with MoreLikeThis method.
Date Fri, 21 Jul 2006 13:03:07 GMT
hi mark,
> Does your index use StandardAnalyzer? Are your fields stored (Field.Store.YES)?
Thanks! that was the hint in the right direction, the FIeld was Stored
but not indexed:

titleDocument.add(new Field("kurz", title.getKurz(), Field.Store.YES,
Field.Index.NO));
(That was the field for the short description of a document)

If I set the Fieldname to another Field (indexed with StandardAnalyzer)
which is Indexed (but not Stored) it works if I use the
like(StringReader ) Method but not with like(int docid).

This Code works:

			 MoreLikeThis mlt = new MoreLikeThis(this.ir);
			 mlt.setFieldNames( new String[] {"freitext"} );
			 mlt.setMinDocFreq(0);
	
	 Query query = mlt.like(new
StringReader(hits.doc(0).getField("kurz").stringValue()));

			 System.out.println("QUERY:"+query);

but it doesn't Work with mlt.like( hits.id(0) ); - I think because the
like method uses then the unstored "freitext" Field?

However that works for me because I can get the Document-Data from the
Database and use the like(StringReader) method.


The generated MLT-Query gets very specific so it only finds the same
Doc. Do you have any hints with which parameters I should start to play
to get the query more general?  Perhaps setMaxQueryTerms?

> 
> MoreLikeThis uses StandardAnalyzer by default to read the stored content from the example
doc which may produce tokens that do not match those of the indexed content. Use setAnalyzer()
to ensure they are in sync.
> 
> 
> 
> 
> ----- Original Message ----
> From: Martin Braun <mbraun@uni-hd.de>
> To: java-user@lucene.apache.org
> Sent: Friday, 21 July, 2006 11:09:40 AM
> Subject: Re: Problem finding similar documents with MoreLikeThis method.
> 
> Hello,
> 
> inspired by this thread, I also tried to implement a MoreLikeThis
> search. But I have the same Problem of a null query.
> 
> I did set the Fieldname to a Field that is stored in the Index.
> But "like" just returns null.
> 
> Here is my Code:
> 
>             Hits hits = this.is.search(new TermQuery(new Term("katkey", Katkey)));
>             System.out.println("DOCID:"+hits.id(0));
>             System.out.println("hits:"+hits.doc(0).getField("kurz").stringValue()   
);
>              MoreLikeThis mlt = new MoreLikeThis(this.ir);
>              mlt.setFieldNames( new String[] {"kurz"} );
>              mlt.setMinDocFreq(0);
>              Query query = mlt.like( hits.id(0) );
>              System.out.println("QUERY:"+query);
>              return this.query(query.toString(),0,10,0);
> 
> 
> The Field "kurz" contains the following String:
> 
> 003481627  M               <v>Swinton, Elizabeth DeSabato</v>: <a>¬The¬
> graphic art of Onchi Koshiro</a> : innovation and tradition / Elizabeth
> de Sab
> ato Swinton
> New York [u.a.]: Garland, <b>1986</b>. - XXVIII, 307, |&lt;180&gt;|
S. :
> zahlr. Ill.
> ISBN 0-8240-6868-8
> 
> 
> Any Ideas?
> 
> thanks in advance,
> martin
> 
> 
> Davide schrieb:
>> mark harwood wrote:
>>> Does your index have only the one document?
>>>
>>> MoreLikeThis will only generate queries with terms that occur in more than "minDocFreq"
(default setting is 5).
>>>
>>> This is to avoid the large overheads associated with searching for very common
words in your example text.
>>>
>>>
>>>
>> Thanks very much Mark.
>> I have specified:
>>
>> setMinDocFreq(0);
>>
>> for testing and It works... the query now isn't empty and the search
>> return a document...
>> So I think that the "problem" was that...
>>
>> Thank you
>> Davide.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


-- 
Universitaetsbibliothek Heidelberg   Tel: +49 6221 54-2580
Ploeck 107-109, D-69117 Heidelberg   Fax: +49 6221 54-2623

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message