lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Date Tue, 10 Dec 2013 04:55:59 GMT
Hi Salman, 

I never used commons gram filer but I remember there are two classes in this family. CommonGramsFilter
and CommonGramsQueryFilter. It seems that CommonsGramsQueryFilter is what you are after. 

http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html


http://khaidoan.wikidot.com/solr-common-gram-filter





On Tuesday, December 10, 2013 6:43 AM, Salman Akram <salman.akram@northbaysolutions.net>
wrote:
We used that syntax in 1.4.1 when Surround was not part of SOLR and has to
register it. Didn't know that it is now part of SOLR. Any ways this is a
red herring since I have totally removed Surround and the issue remains
there.

Below is the debug info when I give a simple phrase query having common
words with default Query Parser. What I don't understand is that why is it
including single tokens as well? I have also included the relevant config
part below.

"rawquerystring": "Contents:\"only be\"",
"querystring": "Contents:\"only be\"",
"parsedquery": "MultiPhraseQuery(Contents:\"(only only_be) be\")",
"parsedquery_toString": "Contents:\"(only only_be) be\"",

"QParser": "LuceneQParser",

=====

<fieldtype name="text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.CommonGramsFilterFactory" words="commonwords.txt"
ignoreCase="true"/>
</analyzer>
</fieldtype>



On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher <erik.hatcher@gmail.com> wrote:

> But again, as Ahmet mentioned… it doesn't look like the surround query
> parser is actually being used.   The debug output also mentioned the query
> parser used, but that part wasn't provided below.  One thing to note here,
> the surround query parser is not available in 1.4.1.   It also looks like
> you're surrounding your query with angle brackets, as it says query string
> is {!surround}<Contents:"only be">, which is not correct syntax.  And one
> of the most important things to note here is that the surround query parser
> does NOT use the analysis chain of the field, see <
> http://wiki.apache.org/solr/SurroundQueryParser#Limitations>.  In short,
> you're going to have to do some work to get common grams factored into a
> surround query (such as maybe calling to the analysis request hander to
> "parse" the query before sending it to the surround query parser).
>
>         Erik
>
>
> On Dec 9, 2013, at 9:36 AM, Salman Akram <
> salman.akram@northbaysolutions.net> wrote:
>
> > Yup on debugging I found that its coming in Analyzer. We are using
> Standard
> > Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if
> its
> > a bug or I am missing some config.
> >
> >
> > On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
> >
> >> Hi Salman,
> >> I am confused because with surround no analysis is applied at query
> time.
> >> I suspect that surround query parser is not kicking in. You should see
> >> SrndQuery or something like at parser query section.
> >>
> >>
> >>
> >> On Monday, December 9, 2013 6:24 AM, Salman Akram <
> >> salman.akram@northbaysolutions.net> wrote:
> >>
> >> All,
> >>
> >> I posted this sub-issue with another issue few days back but maybe it
> was
> >> not obvious so posting it on a separate thread.
> >>
> >> We recently migrated to SOLR 4.6. We use Common Grams but queries with
> >> words in the CG list have slowed down. On debugging we found that for CG
> >> words the parser is adding individual tokens of those words in the query
> >> too which ends up slowing it. Below is an example:
> >>
> >> Query = "only be"
> >>
> >> Here is what debug shows. I have highlighted the red part which is
> >> different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
> >> and adding individual tokens too. Can someone help?
> >>
> >> SOLR 4.6 (takes 20 secs)
> >> <str name="rawquerystring">{!surround}<Contents:"only be"></str>
> >> <str name="querystring">{!surround}<Contents:"only be"></str>
> >> <str name="parsedquery">MultiPhraseQuery(Contents:"(only only_be)
> >> be")</str>
> >> <str name="parsedquery_toString">Contents:"(only only_be) be"</str>
> >>
> >> SOLR 1.4.1 (takes 1 sec)
> >> <str name="rawquerystring">{!surround}<Contents:"only be"></str>
> >> <str name="querystring">{!surround}<Contents:"only be"></str>
> >> <str name="parsedquery">Contents:only_be</str>
> >> <str name="parsedquery_toString">Contents:only_be</str>--
> >>
> >>
> >> Regards,

> >>
> >> Salman Akram
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
>
>


-- 
Regards,

Salman Akram

Mime
View raw message