lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Date Tue, 10 Dec 2013 14:01:35 GMT
Hi Salman,

I personally do not perform stopword removal. So are you saying CommonGramsFilter is not useful
without CommonGramsFilterQueryFilter? If yes, do you want to add a comment to confluence explaining
this? 

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-CommonGramsFilter





On Tuesday, December 10, 2013 1:17 PM, Salman Akram <salman.akram@northbaysolutions.net>
wrote:
Thanks!! Using CommonGramsQueryFilter resolved the issue.

This was not there in 1.4.1 and also for some reason was not there in SOLR
4 Release Notes that we studied before upgrading.


On Tue, Dec 10, 2013 at 9:55 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:

> Hi Salman,
>
> I never used commons gram filer but I remember there are two classes in
> this family. CommonGramsFilter and CommonGramsQueryFilter. It seems that
> CommonsGramsQueryFilter is what you are after.
>
>
> http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html
>
>
> http://khaidoan.wikidot.com/solr-common-gram-filter
>
>
>
>
>
> On Tuesday, December 10, 2013 6:43 AM, Salman Akram <
> salman.akram@northbaysolutions.net> wrote:
> We used that syntax in 1.4.1 when Surround was not part of SOLR and has to
> register it. Didn't know that it is now part of SOLR. Any ways this is a
> red herring since I have totally removed Surround and the issue remains
> there.
>
> Below is the debug info when I give a simple phrase query having common
> words with default Query Parser. What I don't understand is that why is it
> including single tokens as well? I have also included the relevant config
> part below.
>
> "rawquerystring": "Contents:\"only be\"",
> "querystring": "Contents:\"only be\"",
> "parsedquery": "MultiPhraseQuery(Contents:\"(only only_be) be\")",
> "parsedquery_toString": "Contents:\"(only only_be) be\"",
>
> "QParser": "LuceneQParser",
>
> =====
>
> <fieldtype name="text" class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.CommonGramsFilterFactory" words="commonwords.txt"
> ignoreCase="true"/>
> </analyzer>
> </fieldtype>
>
>
>
> On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher <erik.hatcher@gmail.com>
> wrote:
>
> > But again, as Ahmet mentioned… it doesn't look like the surround query
> > parser is actually being used.   The debug output also mentioned the
> query
> > parser used, but that part wasn't provided below.  One thing to note
> here,
> > the surround query parser is not available in 1.4.1.   It also looks like
> > you're surrounding your query with angle brackets, as it says query
> string
> > is {!surround}<Contents:"only be">, which is not correct syntax.  And one
> > of the most important things to note here is that the surround query
> parser
> > does NOT use the analysis chain of the field, see <
> > http://wiki.apache.org/solr/SurroundQueryParser#Limitations>.  In short,
> > you're going to have to do some work to get common grams factored into a
> > surround query (such as maybe calling to the analysis request hander to
> > "parse" the query before sending it to the surround query parser).
> >
> >         Erik
> >
> >
> > On Dec 9, 2013, at 9:36 AM, Salman Akram <
> > salman.akram@northbaysolutions.net> wrote:
> >
> > > Yup on debugging I found that its coming in Analyzer. We are using
> > Standard
> > > Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if
> > its
> > > a bug or I am missing some config.
> > >
> > >
> > > On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan <iorixxx@yahoo.com>
> wrote:
> > >
> > >> Hi Salman,
> > >> I am confused because with surround no analysis is applied at query
> > time.
> > >> I suspect that surround query parser is not kicking in. You should see
> > >> SrndQuery or something like at parser query section.
> > >>
> > >>
> > >>
> > >> On Monday, December 9, 2013 6:24 AM, Salman Akram <
> > >> salman.akram@northbaysolutions.net> wrote:
> > >>
> > >> All,
> > >>
> > >> I posted this sub-issue with another issue few days back but maybe it
> > was
> > >> not obvious so posting it on a separate thread.
> > >>
> > >> We recently migrated to SOLR 4.6. We use Common Grams but queries with
> > >> words in the CG list have slowed down. On debugging we found that for
> CG
> > >> words the parser is adding individual tokens of those words in the
> query
> > >> too which ends up slowing it. Below is an example:
> > >>
> > >> Query = "only be"
> > >>
> > >> Here is what debug shows. I have highlighted the red part which is
> > >> different in both versions i.e. SOLR 4.6 is making it a
> multiphrasequery
> > >> and adding individual tokens too. Can someone help?
> > >>
> > >> SOLR 4.6 (takes 20 secs)
> > >> <str name="rawquerystring">{!surround}<Contents:"only be"></str>
> > >> <str name="querystring">{!surround}<Contents:"only be"></str>
> > >> <str name="parsedquery">MultiPhraseQuery(Contents:"(only only_be)
> > >> be")</str>
> > >> <str name="parsedquery_toString">Contents:"(only only_be) be"</str>
> > >>
> > >> SOLR 1.4.1 (takes 1 sec)
> > >> <str name="rawquerystring">{!surround}<Contents:"only be"></str>
> > >> <str name="querystring">{!surround}<Contents:"only be"></str>
> > >> <str name="parsedquery">Contents:only_be</str>
> > >> <str name="parsedquery_toString">Contents:only_be</str>--
> > >>
> > >>
> > >> Regards,

>
> > >>
> > >> Salman Akram
> > >>
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Salman Akram
> >
> >
>
>
> --
> Regards,
>
> Salman Akram
>



-- 
Regards,

Salman Akram

Mime
View raw message