lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yu shen <shenyu...@gmail.com>
Subject Re: Doing url search in solr is slow
Date Tue, 10 Jan 2012 01:20:41 GMT
Hi Erick,

I only added debugyQuery=on to the url, and did not do any configuration
with regard to DebugComponent. Seems like 'string' type should be
substituted with 'text' type.

I will paste the result here after I did some experiments.

Spark

2012/1/9 Erick Erickson <erickerickson@gmail.com>

> Do you by chance have the debugQuery on by default?
> Because if you look down in the "timing" section,
> you can see the times the various components took to do
> their work, there are two sections "prepare" and "process".
>
> The cumulative time is 17.156 seconds. Of which 17.156
> seconds is reported to be in the DebugComponent.....
>
> So what happens if you just turn that component off? Because
> I don't see anything in your output that really looks like it is
> taking any time. Of course if you've changed your code from
> *url* to url*, that will account for time too, since the infix  case
> requires that every term in the fields in question be examined.
>
> About WordDelimiterFilterFactory That is irrelevant for a "string"
> type. It's an oen question whether a string type is what you
> want, but that is determined by your problem space. You might
> spend some time with admin/analysis to see the effects of
> various analysis chains. "string" is used when you want no
> tokenization, no case transformations etc.
>
> Best
> Erick
>
> On Mon, Jan 9, 2012 at 10:04 AM, yu shen <shenyu.sh@gmail.com> wrote:
> > Hi Erick,
> >
> > Thanks for you reply. Actually I did the following search:
> > survey_url:http\://www.someurl.com/sch/i.html* referal_url:http\://
> > www.someurl.com/sch/i.html* page_url:http\://www.someurl.com/sch/i.html*
> >
> > I did not prepend any asterisk to the field value, but only append to
> them.
> >
> > I analyze url field on solr admin page, and it give me this, meaning the
> > url is not tokenized. I notice you mentioned a
> WordDelimiterFilterFactory.
> > Do I need to configure it in schema.xml or some place else?
> > term position 1 term text http://www.someurl.com/sch/i.html* term type
> > word source
> > start,end 0,31
> > I add the debugQuery=on to the query url, I got this (Sorry to paste such
> > long encrypted code here, they are really mysterious to me)
> > <lst name="debug">
> >    <str name="rawquerystring">survey_url:http\://
> > www.someurl.com/sch/i.html*
> > referal_url:http\://www.someurl.com/sch/i.html*page_url:http\://<http://www.someurl.com/sch/i.html*page_url:http%5C://>
> > www.someurl.com/sch/i.html*</str>
> >    <str name="querystring">survey_url:http\://
> www.someurl.com/sch/i.html*referal_url:http\://<http://www.someurl.com/sch/i.html*referal_url:http%5C://>
> > www.someurl.com/sch/i.html* page_url:http\://www.someurl.com/sch/i.html*
> > </str>
> >    <str name="parsedquery">survey_url:
> http://www.someurl.com/sch/i.html*referal_url:
> > http://www.someurl.com/sch/i.html* page_url:
> > http://www.someurl.com/sch/i.html*</str>
> >    <str name="parsedquery_toString">survey_url:
> > http://www.someurl.com/sch/i.html* referal_url:
> > http://www.someurl.com/sch/i.html* page_url:
> > http://www.someurl.com/sch/i.html*</str>
> >    <lst name="explain">
> >        <str name="5007688343">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> >        </str>
> >        <str name="5007648909">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> >        </str>
> >        <str name="5007653989">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> >        </str>
> >        <str name="5007709065">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> >        </str>
> >        <str name="5007710379">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007739634">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007753066">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007756045">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007832978">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007849124">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str></lst><str name="QParser">LuceneQParser</str><lst
> > name="timing"><double name="time">17156.0</double><lst
> > name="prepare"><double name="time">0.0</double><lst
> > name="org.apache.solr.handler.component.QueryComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.FacetComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.HighlightComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.StatsComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.DebugComponent"><double
> > name="time">0.0</double></lst></lst><lst name="process"><double
> > name="time">17156.0</double><lst
> > name="org.apache.solr.handler.component.QueryComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.FacetComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.HighlightComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.StatsComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.DebugComponent"><double
> > name="time">17156.0</double></lst></lst></lst></lst>
> >
> >
> >
> > 2012/1/9 Erick Erickson <erickerickson@gmail.com>
> >
> >> Yu Shen & Arian:
> >>
> >> We can't help much without more information. In particular, how are
> >> the fields in question analyzed? What is the result of looking
> >> at the admin/analysis page? What do you get when you
> >> attach &debugQuery=on to the query?
> >>
> >> You might review:
> >> http://wiki.apache.org/solr/UsingMailingLists
> >>
> >> But at a wild guess, you have something like WordDelimiterFilterFactory
> >> in your analysis chain, and it's splitting up your input into
> >> "www" "someurl" "com" as separate tokens, and www matches
> >> all documents so Solr is having to score all documents in your corpus,
> but
> >> that's just a guess. See the admin/schema browser page and find the most
> >> frequent terms for the field in question, that should indicate whether
> >> you have some tokens that appear in all docs. Try searching on
> >> plain "someurl". Is that slow? Or "someurl.anotherpart" or whatever.
> >>
> >> Best
> >> Erick
> >>
> >> 2012/1/9 Fran├žois Schiettecatte <fschiettecatte@gmail.com>:
> >> > About the search 'referal_url:*www.someurl.com*', having a wildcard
> at
> >> the start will cause a dictionary scan for every term you search on
> unless
> >> you use ReversedWildcardFilterFactory. That could be the cause of your
> >> slowdown if you are I/O bound, and even if you are CPU bound for that
> >> matter.
> >> >
> >> > Fran├žois
> >> >
> >> >
> >> > On Jan 8, 2012, at 8:44 PM, yu shen wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> My solr document has up to 20 fields, containing data from product
> name,
> >> >> date, url etc.
> >> >>
> >> >> The volume of documents is around 1.5m.
> >> >>
> >> >> My symptom is when doing url search like [ url:*www.someurl.com*
> >> >> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get
a
> >> >> extraordinary long response time, while search against all other
> fields,
> >> >> the response time will be normal.
> >> >>
> >> >> Can anyone share any insights on this?
> >> >>
> >> >> Spark
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message