lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Knappmeier <n.knappme...@i-views.de>
Subject Prefix-Queries and Syntax Highlighting
Date Fri, 08 Jan 2016 09:20:29 GMT
Good morning,

we currently use Lucene 4.3 in our project. We automatically generate 
PrefixQueries and we are passing the rewritten query to the Highlighter 
to highlight search terms in the search result.
Up until a few days ago, we were using a 
MultiTermQuery.CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE because the 
highlighter does not work with the ConstantScoreQueries generated by the
MultiTermQuery.ConstantScoreAutoRewrite. We have also set the 
"maxClauseCount" to a very large number to avoid the 
TooManyClausesException. This has worked well for years until now.

Now there have been some searches for "a b c" or "s t am p s" which 
generated OutOfMemoryErrors, so we now use the ConstantScoreAutoRewrite 
and accept that some terms are not highlighted in the search result.
However, I read in the changelog of Lucene 5.0 that
MultiTermQuery.ConstantScoreAutoRewrite was removed in favour of 
MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE.

My problems:

1) PrefixQueries rewritten with a 
MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE don't work with the default 
Highlighter at all.
2) Passing the original query to the Highlighter directly worked in my 
testcases, but without a very large dataset. I have noticed the the 
WeightedSpanTermExtractor which is used by the Highlighter  uses a 
MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE so I fear if we do that, we 
will get OutOfMemory again when somebody searches for "a b c".

What method do you suggest to highlight prefix-terms. I should also 
mention that we are using a custom formatter and a custom 
text-fragmenter. I have not found any tutorials for the 
FastVectorHighlighter. The PostingsHighlighter might work but I'm not 
sure how to implement custom fragment sizes.

Thanks in advance,

Nils Knappmeier

-- 
--

Nils Knappmeier | Software Engineer
intelligent views gmbh
Julius-Reiber-Str. 17 |64293 Darmstadt

Tel ++49(0)6151 - 5006-228 | Fax ++49(0)6151 - 5006-138
e-mail: n.knappmeier@i-views.de | www.i-views.de


Geschäftsführer: Jörg Kleinz, Klaus Reichenberger
Die Gesellschaft ist eingetragen beim Amtsgericht Darmstadt (Sitz der
Gesellschaft) Nr. HRB 7965

Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie
nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren
Sie bitte sofort den Absender und loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die
unbefugte Weitergabe dieser Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you are not the intended
recipient (or have received this e-mail in error) please notify the sender immediately and
delete this e-mail. Any unauthorised copying, disclosure or distribution of the contents in
this e-mail is strictly forbidden.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message