lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Shingle and Query Performance
Date Tue, 30 Aug 2011 14:38:31 GMT
Can we see the output if you specify both
&debugQuery=on&debug=true

the debug=true will show the time taken up with various
components, which is sometimes surprising...

Second, we never asked the most basic question, what are
you measuring? Is this the QTime of the returned response?
(which is the time actually spent searching) or the time until
the response gets back to the client, which may involve lots besides
searching...

Best
Erick

On Tue, Aug 30, 2011 at 7:59 AM, Lord Khan Han <khanuniverse1@gmail.com> wrote:
> Hi Eric,
>
> Fields are lazy loading, content stored in solr and machine 32 gig.. solr
> has 20 gig heap. There is no swapping.
>
> As you see we have many phrases in the same query . I couldnt find a way to
> drop qtime to subsecends. Suprisingly non shingled test better qtime !
>
>
> On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> Oh, one other thing: have you profiled your machine
>> to see if you're swapping? How much memory are
>> you giving your JVM? What is the underlying
>> hardware setup?
>>
>> Best
>> Erick
>>
>> On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson <erickerickson@gmail.com>
>> wrote:
>> > 200K docs and 36G index? It sounds like you're storing
>> > your documents in the Solr index. In and of itself, that
>> > shouldn't hurt your query times, *unless* you have
>> > lazy field loading turned off, have you checked that
>> > lazy field loading is enabled?
>> >
>> >
>> >
>> > Best
>> > Erick
>> >
>> > On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han <khanuniverse1@gmail.com>
>> wrote:
>> >> Another insteresting thing is : all one word or more word queries
>> including
>> >> phrase queries such as "barack obama"  slower in shingle configuration.
>> What
>> >> i am doing wrong ? without shingle "barack obama" Querytime 300ms  with
>> >> shingle  780 ms..
>> >>
>> >>
>> >> On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han <khanuniverse1@gmail.com
>> >wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> What is the difference between solr 3.3  and the trunk ?
>> >>> I will try 3.3  and let you know the results.
>> >>>
>> >>>
>> >>> Here the search handler:
>> >>>
>> >>> <requestHandler name="search" class="solr.SearchHandler"
>> default="true">
>> >>>      <lst name="defaults">
>> >>>        <str name="echoParams">explicit</str>
>> >>>        <int name="rows">10</int>
>> >>>        <!--<str name="fq">category:vv</str>-->
>> >>>  <str name="fq">mrank:[0 TO 100]</str>
>> >>>        <str name="echoParams">explicit</str>
>> >>>        <int name="rows">10</int>
>> >>>  <str name="defType">edismax</str>
>> >>>        <!--<str name="qf">title^0.05 url^1.2 content^1.7
>> >>> m_title^10.0</str>-->
>> >>> <str name="qf">title^1.05 url^1.2 content^1.7 m_title^10.0</str>
>> >>>  <!-- <str name="bf">recip(ee_score,-0.85,1,0.2)</str>
-->
>> >>>  <str name="pf">content^18.0 m_title^5.0</str>
>> >>>  <int name="ps">1</int>
>> >>>  <int name="qs">0</int>
>> >>>  <str name="mm">2&lt;-25%</str>
>> >>>  <str name="spellcheck">true</str>
>> >>>  <!--<str name="spellcheck.collate">true</str>   -->
>> >>> <str name="spellcheck.count">5</str>
>> >>>  <str name="spellcheck.dictionary">subobjective</str>
>> >>> <str name="spellcheck.onlyMorePopular">false</str>
>> >>>   <str name="hl.tag.pre">&lt;b&gt;</str>
>> >>> <str name="hl.tag.post">&lt;/b&gt;</str>
>> >>>  <str name="hl.useFastVectorHighlighter">true</str>
>> >>>      </lst>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sat, Aug 27, 2011 at 5:31 PM, Erik Hatcher <erik.hatcher@gmail.com
>> >wrote:
>> >>>
>> >>>> I'm not sure what the issue could be at this point.   I see you've
got
>> >>>> qt=search - what's the definition of that request handler?
>> >>>>
>> >>>> What is the parsed query (from the debugQuery response)?
>> >>>>
>> >>>> Have you tried this with Solr 3.3 to see if there's any appreciable
>> >>>> difference?
>> >>>>
>> >>>>        Erik
>> >>>>
>> >>>> On Aug 27, 2011, at 09:34 , Lord Khan Han wrote:
>> >>>>
>> >>>> > When grouping off the query time ie 3567 ms  to 1912 ms .
Grouping
>> >>>> > increasing the query time and make useless to cache. But same
config
>> >>>> faster
>> >>>> > without shingle still.
>> >>>> >
>> >>>> > We have and head to head test this wednesday tihs commercial
search
>> >>>> engine.
>> >>>> > So I am looking for all suggestions.
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > On Sat, Aug 27, 2011 at 3:37 PM, Erik Hatcher <
>> erik.hatcher@gmail.com
>> >>>> >wrote:
>> >>>> >
>> >>>> >> Please confirm is this is caused by grouping.  Turn grouping
off,
>> >>>> what's
>> >>>> >> query time like?
>> >>>> >>
>> >>>> >>
>> >>>> >> On Aug 27, 2011, at 07:27 , Lord Khan Han wrote:
>> >>>> >>
>> >>>> >>> On the other hand We couldnt use the cache for below
types
>> queries. I
>> >>>> >> think
>> >>>> >>> its caused from grouping. Anyway we need to be sub
second without
>> >>>> cache.
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> On Sat, Aug 27, 2011 at 2:18 PM, Lord Khan Han <
>> >>>> khanuniverse1@gmail.com
>> >>>> >>> wrote:
>> >>>> >>>
>> >>>> >>>> Hi,
>> >>>> >>>>
>> >>>> >>>> Thanks for the reply.
>> >>>> >>>>
>> >>>> >>>> Here the solr log capture.:
>> >>>> >>>>
>> >>>> >>>> ******
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>
>> >>>>
>> hl.fragsize=100&spellcheck=true&spellcheck.q=XXXXX&group.limit=5&hl.simple.pre=<b>&hl.fl=content&spellcheck.collate=true&wt=javabin&hl=true&rows=20&version=2&fl=score,approved,domain,host,id,lang,mimetype,title,tstamp,url,category&hl.snippets=3&start=0&q=%2BXXXX+-"XXXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXXX"+-"XXXXXX"+-XXXX+-"XXXXXX"+-XXX+-"XXXXX"+-XXXX+-XXXX+-"XXXXX"+-"XXXXX"+-"XXXXX"+-XXXX+-"XXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXXX"+-XXXX+-"XXXXX"+-"XXXXXX"+-XXXX+-"XXXXX"+-"XXXXX"+-XXXXX+-"XXXXX"+-"XXXXX"+-"XXXXX"+-"XXXXX"+-XXXXX+-"XXXXXX"+-"XXXXXX"+-XXXXXX+-XXXXX+-"XXXXX"+"XXXXX"+"XXXXX"+"XXXXXX"++&group.field=host&hl.simple.post=</b>&group=true&qt=search&fq=mrank:[0+TO+100]&fq=word_count:[70+TO+*]
>> >>>> >>>> ******
>> >>>> >>>>
>> >>>> >>>> XXXX is the words. All phrases "xxxxx"  has two
words inside.
>> >>>> >>>>
>> >>>> >>>> The timing from the DebugQuery:
>> >>>> >>>>
>> >>>> >>>> <lst name="timing">
>> >>>> >>>> <double name="time">8654.0</double>
>> >>>> >>>> <lst name="prepare">
>> >>>> >>>> <double name="time">16.0</double>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.QueryComponent">
>> >>>> >>>> <double name="time">16.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.FacetComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst
>> name="org.apache.solr.handler.component.MoreLikeThisComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.HighlightComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.StatsComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst
>> name="org.apache.solr.handler.component.SpellCheckComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.DebugComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="process">
>> >>>> >>>> <double name="time">8638.0</double>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.QueryComponent">
>> >>>> >>>> <double name="time">4473.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.FacetComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst
>> name="org.apache.solr.handler.component.MoreLikeThisComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.HighlightComponent">
>> >>>> >>>> <double name="time">42.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.StatsComponent">
>> >>>> >>>> <double name="time">0.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst
>> name="org.apache.solr.handler.component.SpellCheckComponent">
>> >>>> >>>> <double name="time">1.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>> <lst name="org.apache.solr.handler.component.DebugComponent">
>> >>>> >>>> <double name="time">4122.0</double>
>> >>>> >>>> </lst>
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> The funny thing is if I removed the ShingleFilter
from the below
>> >>>> >> "sh_text"
>> >>>> >>>> field and index normally  the query time is half
of the current
>> >>>> shingle
>> >>>> >> one
>> >>>> >>>> !. Shouldn't  be shingled index better for such
heavy 2 word
>> phrases
>> >>>> >> search
>> >>>> >>>> ? I am confused.
>> >>>> >>>>
>> >>>> >>>> On the other hand One of the on the shelf big FAT
companies
>> search
>> >>>> >> engine
>> >>>> >>>> doing the same query same machine 0.7 / 0.8 secs
without cache .
>> I am
>> >>>> >>>> confident we can do better in solr but couldnt
find the way at
>> the
>> >>>> >> moment.
>> >>>> >>>>
>> >>>> >>>> thanks for helping..
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> On Sat, Aug 27, 2011 at 2:46 AM, Erik Hatcher <
>> >>>> erik.hatcher@gmail.com
>> >>>> >>> wrote:
>> >>>> >>>>
>> >>>> >>>>>
>> >>>> >>>>> On Aug 26, 2011, at 17:49 , Lord Khan Han wrote:
>> >>>> >>>>>> We are indexing news  document from the
various sites.
>> Currently we
>> >>>> >> have
>> >>>> >>>>>> 200K docs indexed. Total index size is
36 gig.  There is also
>> >>>> >>>>> attachement to
>> >>>> >>>>>> the news (pdf -docs etc) So document size
could be high (ie
>> 10mb).
>> >>>> >>>>>>
>> >>>> >>>>>> We are using some complex queries which
includes around 30 - 40
>> >>>> terms
>> >>>> >>>>> per
>> >>>> >>>>>> query. %70 of this terms is two word phrases.
We are using
>> >>>> >>>>>> with conjunction +  and -  to pinpoint
exact result.
>> >>>> >>>>>> There is also grouping, dismax and boosting
, Termvector HL  .
>> >>>> >>>>>
>> >>>> >>>>> You're using a lot of componentry there, and
have complex
>> queries.
>> >>>>  We
>> >>>> >>>>> need more details.
>> >>>> >>>>>
>> >>>> >>>>> Turn on debugQuery=true... what do the timings
say for each
>> >>>> component?
>> >>>> >>>>>
>> >>>> >>>>>> Our problem is query times. Currently its
around 6-7 secs. I
>> know
>> >>>> our
>> >>>> >>>>> query
>> >>>> >>>>>> is little bit heavy but we want to improve
query performance. I
>> >>>> >> believe
>> >>>> >>>>> we
>> >>>> >>>>>> can make it sub second but no succes at
the moment.
>> >>>> >>>>>
>> >>>> >>>>> Please provide an example query or two (perhaps
a full line
>> logged
>> >>>> from
>> >>>> >>>>> Solr itself), and then let's see what debugQuery
says about your
>> >>>> query
>> >>>> >> being
>> >>>> >>>>> parsed.
>> >>>> >>>>>
>> >>>> >>>>>> We tried to use shingle 2 word token it
decreases the query
>> >>>> performcen
>> >>>> >>>>> !! We
>> >>>> >>>>>> assumed it will help the speed up phrases
search..
>> >>>> >>>>>
>> >>>> >>>>> Again, we'd need to see a parsed query to understand
this
>> deeper.
>> >>>> >>>>>
>> >>>> >>>>> Lots of synonym expansion?  A parsed query
will tell us.
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>>> (using solr latest trunk and HW is pretty
good, 32 core  with
>> 32
>> >>>> gig
>> >>>> >>>>> ram)
>> >>>> >>>>>>
>> >>>> >>>>>> Here the field def:
>> >>>> >>>>>>
>> >>>> >>>>>> <fieldType name="sh_text" class="solr.TextField"
>> >>>> >>>>> positionIncrementGap="100"
>> >>>> >>>>>> autoGeneratePhraseQueries="true">
>> >>>> >>>>>>    <analyzer type="index">
>> >>>> >>>>>>      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >>>> >>>>>>      <filter class="solr.StopFilterFactory"
ignoreCase="true"
>> >>>> >>>>>> words="stopwords.txt" enablePositionIncrements="true"
/>
>> >>>> >>>>>>      <filter class="solr.WordDelimiterFilterFactory"
>> >>>> >>>>>> generateWordParts="1" generateNumberParts="1"
catenateWords="1"
>> >>>> >>>>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> >>>> >>>>>>      <!--<filter class="solr.LowerCaseFilterFactory"/>-->
>> >>>> >>>>>>      <filter class="solr.KeywordMarkerFilterFactory"
>> >>>> >>>>>> protected="protwords.txt"/>
>> >>>> >>>>>>      <filter class="solr.ShingleFilterFactory"
>> maxShingleSize="2"
>> >>>> >>>>>> outputUnigrams="true"/>
>> >>>> >>>>>>    </analyzer>
>> >>>> >>>>>>    <analyzer type="query">
>> >>>> >>>>>>      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >>>> >>>>>>      <filter class="solr.SynonymFilterFactory"
>> >>>> >> synonyms="synonyms.txt"
>> >>>> >>>>>> ignoreCase="true" expand="true"/>
>> >>>> >>>>>>      <filter class="solr.StopFilterFactory"
ignoreCase="true"
>> >>>> >>>>>> words="stopwords.txt" enablePositionIncrements="true"
/>
>> >>>> >>>>>>      <filter class="solr.WordDelimiterFilterFactory"
>> >>>> >>>>>> generateWordParts="1" generateNumberParts="1"
catenateWords="0"
>> >>>> >>>>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> >>>> >>>>>>      <!--<filter class="solr.LowerCaseFilterFactory"/>-->
>> >>>> >>>>>>      <filter class="solr.KeywordMarkerFilterFactory"
>> >>>> >>>>>> protected="protwords.txt"/>
>> >>>> >>>>>>      <filter class="solr.ShingleFilterFactory"
>> maxShingleSize="2"
>> >>>> >>>>>> outputUnigrams="true"/>
>> >>>> >>>>>>    </analyzer>
>> >>>> >>>>>>  </fieldType>
>> >>>> >>>>>>
>> >>>> >>>>>> and
>> >>>> >>>>>>
>> >>>> >>>>>> <field name="content" type="sh_text"
stored="true"
>> indexed="true"
>> >>>> >>>>>> termVectors="true" termPositions="true"
termOffsets="true"/>
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>
>> >>>> >>
>> >>>> >>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>

Mime
View raw message