lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Woodward (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-7284) UnsupportedOperationException wrt SpanNearQuery with Gap (Needed for Synonym Query Expansion)
Date Thu, 19 May 2016 08:53:12 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Woodward updated LUCENE-7284:
----------------------------------
         Priority: Minor  (was: Blocker)
    Fix Version/s: 6.1

> UnsupportedOperationException wrt SpanNearQuery with Gap (Needed for Synonym Query Expansion)
> ---------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7284
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7284
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>            Reporter: Daniel Bigham
>            Assignee: Alan Woodward
>            Priority: Minor
>             Fix For: 6.1
>
>         Attachments: LUCENE-7284.patch
>
>
> I am trying to support synonyms on the query side by doing 
> query expansion.
> For example, the query "open webpage" can be expanded if the following 
> things are synonyms:
> "open" | "go to"
> This becomes the following: (I'm using both the stop word filter and the 
> stemming filter)
> {code}
> spanNear(
>          [
>                  spanOr([Title:open, Title:go]),
>                  Title:webpag
>          ],
>          0,
>          true
> )
> {code}
> Notice that "go to" became just "go", because apparently "to" is removed 
> by the stop word filter.
> Interestingly, if you turn "go to webpage" into a phrase, you get "go ? 
> webpage", but if you turn "go to" into a phrase, you just get "go", 
> because apparently a trailing stop word in a PhraseQuery gets dropped. 
> (there would actually be no way to represent the gap currently because 
> it represents gaps implicitly via the position of the phrase tokens, and 
> if there is no second token, there's no way to implicitly indicate that 
> there is a gap there)
> The above query then fails to match "go to webpage", because "go to 
> webpage" in the index tokenizes as "go _ webpage", and the query, 
> because it lost its gap, tried to only match "go webpage".
> To try and work around that, I represent "go to" not as a phrase, but as 
> a SpanNearQuery, like this:
> {code}
> spanNear(
>          [
>                  spanOr(
>                          [
>                                  Title:open,
>                                  spanNear([Title:go, SpanGap(:1)], 0, true),
>                          ]
>                  ),
>                  Title:webpag
>          ],
>          0,
>          true
> )
> {code}
> However, when I run that query, I get the following:
> {code}
> A Java exception occurred: java.lang.UnsupportedOperationException
>      at 
> org.apache.lucene.search.spans.SpanNearQuery$GapSpans.positionsCost(SpanNearQuery.java:398)
>      at 
> org.apache.lucene.search.spans.ConjunctionSpans.asTwoPhaseIterator(ConjunctionSpans.java:96)
>      at 
> org.apache.lucene.search.spans.NearSpansOrdered.asTwoPhaseIterator(NearSpansOrdered.java:45)
>      at 
> org.apache.lucene.search.spans.ScoringWrapperSpans.asTwoPhaseIterator(ScoringWrapperSpans.java:88)
>      at 
> org.apache.lucene.search.ConjunctionDISI.addSpans(ConjunctionDISI.java:104)
>      at 
> org.apache.lucene.search.ConjunctionDISI.intersectSpans(ConjunctionDISI.java:82)
>      at 
> org.apache.lucene.search.spans.ConjunctionSpans.<init>(ConjunctionSpans.java:41)
>      at 
> org.apache.lucene.search.spans.NearSpansOrdered.<init>(NearSpansOrdered.java:54)
>      at 
> org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232)
>      at 
> org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:134)
>      at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:38)
>      at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
> {code}
> ... and when I look up that GapSpans class in SpanNearQuery.java, I see:
> {code}
> @Override
> public float positionsCost() {
>    throw new UnsupportedOperationException();
> }
> {code}
> I asked this question on the mailing list on May 14 and was directed to submit a bug
here.
> This issue is of relatively high priority for us, since this represents the most promising
technique we have for supporting synonyms on top of Lucene. (since the SynonymFilter suffers
serious issues wrt multi-word synonyms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message