lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject looking for a BooleanMatcher instead of BooleanScorer
Date Fri, 01 Jun 2012 07:36:36 GMT
hi all,
    I am looking for a 'BooleanMatcher' in lucene. for many
application, we don't need order matched documents by relevant scores.
we just like the boolean query. But the BooleanScorer/BooleanScorer2
is a little bit heavy for the purpose of relevant scoring.
    one use case is: we have some fields which has very small number
of tokens(usually only one word). such as id,tag or something else.
    But we need query like this: id in (1,3,5.....). if using
booleanQuery (id:1 id:3 id:5 ...). BooleanScorer can only apply to 31
terms. BooleanScorer2 using priority queue to know how many terms are
matched(Coord).
    Filters may help but it can be a very complicated query(or else,
it self still using BooleanQuery, there is a recursive problem)

    we may divide current BooleanScorer to a BooleanMatcher and a
Ranker. if we need score the hitted docs, we ask the BooleanScorer for
not only hitted id but also tf/idf coord or anything we need to use in
ranking. but sometimes we only need docIds. then the BooleanMatcher
can optimize it's implementation. for the case of many disjunction
terms, we can do it like Filter or BooleanScorer instead of
BooleanScorer2.

    is it possible?

    following is some user demands I searched from the mail list. the
first one is my own requirement.

    1. https://github.com/neo4j/community/issues/494

    2. mail to lucene

qibaoyuan@126.com qibaoyuan@126.com via lucene.apache.org
	
May 6
		
to lucene
Hi,
      I met a problem about how to search many keywords  in about
5,000,000 documents.For example the query may be like "(a1 or a2 or a3
....a200) and (b1 or b2 or b3 or b4 ..... b400)",I found it will take
vey long time(40seconds) to get the the answer in only one field(Title
field),and JVM will throw OutMemory error in more fields(title field
plus content field).Any suggestions or good idea to solve this
problem?thanks in advance.


   3 mail to lucene
Chris Book chrisbook@gmail.com via lucene.apache.org
	
Apr 11
		
to solr-user
Hello, I have a solr index running that is working very well as a search.
 But I want to add the ability (if possible) to use it to do matching.  The
problem is that by default it is only looking for all the input terms to be
present, and it doesn't give me any indication as to how many terms in the
target field were not specified by the input.

For example, if I'm trying to match to the song title "dust in the wind",
I'm correctly getting a match if the input query is "dust in wind".  But I
don't want to get a match if the input is just "dust".  Although as a
search "dust" should return this result, I'm looking for some way to filter
this out based on some indication that the input isn't close enough to the
output.  Perhaps if I could get information that that the number of input
terms is much less than the number of terms in the field.  Or something
else along those line?

I realize that this isn't the typical use case for a search, but I'm just
looking for some suggestions as to how I could improve the above example a
bit.

Thanks,
Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message