lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <>
Subject [jira] [Commented] (SOLR-2660) omitPositions improvements
Date Mon, 05 Mar 2012 17:25:59 GMT


Robert Muir commented on SOLR-2660:

I think this could be a good option (in combination with shingles as mentioned), to accelerate

the phrase queries that solr query parsers generate in order to boost closer matches.

Again the idea is to omit positions entirely, and instead use shinglefilter (unigrams and
bigrams), approximating phrase 
queries with n-gram conjunctions. I think for the sloppy case, we should use an n-gram disjunction,
perhaps interpreting 
slop factor as minNrShouldmatch?

This basically means you are substituting levenshtein distance for an n-gram approximation
in both cases.

In general its a classic indexing/search tradeoff, in my tests on wikipedia indexing takes
~ twice as long with the shingles,
but the tradeoff is that for a lot of these use cases you don't need to consult the positions
file at all.

As a parameter to the fieldtype its easily pluggable without messing with any queryparsers,
and ordinary queries (term, boolean, etc)
are totally 'pass-thru', *however* the thing I guess I don't like about this patch is the
fact that this is really a different 
'query intent', in other words, I think its a perfect approach when you just want to boost
scores of close matches 
(e.g. when generated by dismax queryparser), but when your 'intent' is to actually limit matches
to a phrase 
(e.g. when keyed in by a user directly), then this approximation isn't as good of a fit.

Either way I'm open to other opinions before doing anything (if we decide to do it, next step
I think is to update the patch with 
the SloppyPhraseQuery approximation).

> omitPositions improvements
> --------------------------
>                 Key: SOLR-2660
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.3, 4.0
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: SOLR-2660.patch
> followup to LUCENE-2048:
> Adds factory methods getPhraseQuery/getMultiPhraseQuery to QP, this way you can subclass
it and customize behavior, particularly
> * by default, Solr throws exception here if the fieldtype omits positions: rather than
3.x's silent failure of no results, and even for trunk its nicer to fail during query parsing
rather than waiting for lucene's failure during execution.
> * adds phraseAsBoolean, which allows you to downgrade these phrase/multiphrase queries
to boolean queries: this is a nice option in conjunction with our word n-gram filters (shingle/commongrams/etc)for
a fast "approximation", if your application can tolerate some false positives, e.g. "foo bar"
-> termQuery(foo_bar), "foo bar baz" -> BQ(foo_bar AND bar_baz)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message