lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Find similar documents of different types
Date Wed, 01 Feb 2012 10:04:23 GMT
I'm not clear exactly what you are asking but I think you will have to
build your TermQuery instances one at a time and that sounds fine, if
it does what you want and is sufficiently fast.


On Tue, Jan 31, 2012 at 1:34 PM, Pedro Lacerda <> wrote:
> For the first strategy i'm using MoreLikeThis to generate one query (from
> Doc terms) for each analyzed field (from type1 and type2), applying boosts
> and searching with TermsFilter to select only documents of type2.
> For the second I construct an map <termString, boost> where boost is the
> tf-idf of Doc (using searcher and similarity). I failed in using this map
> to construct an query because I'm finding something like TermQuery("*",
> termStr), or building one TermQuery by field by termStr is ok?
> Sorry if i'm not sufficiently explicit about what I mean, I'm on basic
> level English course.
> Pedro Lacerda
> 2012/1/26 Pedro Lacerda <>
>> Hi list,
>> We have two different document types with different fields each. My
>> problem is given one document (Doc) from type1, find similar ones of type2.
>> Initially I thought two strategies to do it:
>>    - index all documents together; build my query with terms from Doc and
>>    fields of type2; and filter out documents of type1.
>>    - index type1 and type2 documents separately; compute scores (like
>>    tf-idf) for each term of Doc on type1 index; build my query with terms from
>>    Doc and apply the scores as boosts; search on type2 index.
>> I hope you have nice suggestions for me, because I started to learn Lucene
>> but she is giving me a lot of headache!
>> Pedro Lacerda

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message