lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "ComplexPhraseQueryParser" by iorixxx
Date Thu, 27 Mar 2014 22:44:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "ComplexPhraseQueryParser" page has been changed by iorixxx:
https://wiki.apache.org/solr/ComplexPhraseQueryParser

New page:
<!> [[Solr4.8]]

<<TableOfContents>>

== Overview ==
The Complex phrase query parser plugin provides support for wildcards, ORs etc inside Phrase
Queries. 

>From [[https://lucene.apache.org/core/api/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html|Complex
Phrase Query Parser javadocs]]:

{{{
QueryParser which permits complex phrase query syntax e.g. "(john jon jonathan~) peters*"
}}}


After indexing example documents under example/exampledocs via 'java -jar post.jar *.xml'
utility

The query string 

{{{
q=manu:"a* c*"&defType=complexphrase
}}} 

or 

{{{
q={!complexphrase inOrder=true}manu:"a* c*"
}}} 

will return :

http://localhost:8983/solr/collection1/select?q=manu:%22a*%20c*%22&defType=complexphrase&fl=manu

{{{
<doc>
  <str name="manu">Apple Computer Inc.</str>
</doc>
<doc>
  <str name="manu">ASUS Computer Inc.</str>
</doc>
}}} 


'''inOrder''' Parameter can be set in two ways.

1) Its default value is true. If you want to set it to false in a permanent way : register
query parser with a different name in solrconfig.xml
{{{
 <!-- Un-ordered complex phrase query parser -->
 <queryParser name="unorderedcomplexphrase" class="org.apache.solr.search.ComplexPhraseQParserPlugin">
   <bool name="inOrder">false</bool>
 </queryParser>
}}}

2) At query time via LocalParams. 
{{{
q={!complexphrase inOrder=false df=name}"bla* pla*"
}}} 

To mix ordered and unordered clauses in the same query.
{{{
+_query_:"{!complexphrase inOrder=true}manu:\"a* c*\""  +_query_:"{!complexphrase inOrder=false
df=name}\"bla* pla*\""  
}}} 

== Limitations ==

=== maxBooleanClauses ===

You may need to increase 
{{{
<maxBooleanClauses>1024</maxBooleanClauses>
}}} 
according to index size in solrconfig.xml because 
{{{
"a* c*"
}}}
is expanded into [[http://lucene.apache.org/core/api/core/org/apache/lucene/search/spans/SpanNearQuery.html|SpanNearQuery]]

{{{
spanNear([spanOr([manu:a, manu:america, manu:apache, manu:apple, manu:asus, manu:ati]), spanOr([manu:canon,
manu:co, manu:computer, manu:corp, manu:corsair])], 0, false)
}}}

=== Stopwords ===

Lets say we add '''the''', '''up''', '''to''' to collection1/conf/stopwords.txt file and re-index
example docs.
While 
{{{
q=features:"Stores up to 15,000"
}}} 
returns ''"Stores up to 15,000 songs, 25,000 photos, or 150 hours of video"'',
{{{
q=features:"sto* up to 15*"&defType=complexphrase
}}}
does not return that document because [[http://lucene.apache.org/core/api/core/org/apache/lucene/search/spans/SpanNearQuery.html|SpanNearQuery]]
 has no good way to handle stopwords in a way analogous to [[http://lucene.apache.org/core/api/core/org/apache/lucene/search/PhraseQuery.html|PhraseQuery]]
. It is recommended not to use stopword elimination with this query parser.

Mime
View raw message