lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Performance optimization of Proximity/Wildcard searches
Date Tue, 08 Feb 2011 03:45:45 GMT
Hi,


Yes, assuming you didn't change the index files, say by optimizing the index, 
the hot portions of the index should remain in the OS cache unless something 
else kicked them out.

Re other thread - I don't think I have those messages any more.

Otis
---
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Salman Akram <salman.akram@northbaysolutions.net>
> To: solr-user@lucene.apache.org
> Sent: Mon, February 7, 2011 2:49:44 AM
> Subject: Re: Performance optimization of Proximity/Wildcard searches
> 
> Only couple of thousand documents are added daily so the old OS cache  should
> still be useful since old documents remain same, right?
> 
> Also  can you please comment on my other thread related to Term  Vectors?
> Thanks!
> 
> On Sat, Feb 5, 2011 at 8:40 PM, Otis Gospodnetic  <otis_gospodnetic@yahoo.com
> >  wrote:
> 
> > Yes, OS cache mostly remains (obviously index files that are  no longer
> > around
> > are going to remain the OS cache for a while,  but will be useless and
> > gradually
> > replaced by new index  files).
> > How long warmup takes is not relevant here, but what queries you  use to
> > warm up
> > the index and how much you auto-warm the  caches.
> >
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > ----- Original  Message ----
> > > From: Salman Akram <salman.akram@northbaysolutions.net>
> >  > To: solr-user@lucene.apache.org
> >  > Sent: Sat, February 5, 2011 4:06:54 AM
> > > Subject: Re:  Performance optimization of Proximity/Wildcard searches
> > >
> >  > Correct me if I am wrong.
> > >
> > > Commit in index flushes  SOLR cache but of  course OS cache would still be
> > > useful? If a  an index is updated every hour  then a warm up that takes
> >  less
> > > than 5 mins should be more than enough,  right?
> >  >
> > > On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic <
> > otis_gospodnetic@yahoo.com
> >  > >  wrote:
> > >
> > > > Salman,
> > >  >
> > > > Warming up may be useful if your  caches are getting  decent hit ratios.
> > > > Plus, you
> > > > are warming  up  the OS cache when you warm up.
> > > >
> > > >  Otis
> > > > ----
> > > >  Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > > > Lucene ecosystem  search :: http://search-lucene.com/
> > > >
> > > >
> > >  >
> > > > ----- Original  Message ----
> > > > >  From: Salman Akram <salman.akram@northbaysolutions.net>
> >  > >  > To: solr-user@lucene.apache.org
> >  > >  > Sent: Fri, February 4, 2011 3:33:41 PM
> > > >  > Subject: Re:  Performance optimization of Proximity/Wildcard  
searches
> > > > >
> > > >  > I know so we are  not really using it for regular warm-ups (in any
> >  case
> > >  >  index
> > > > > is updated on hourly basis). Just  tried  few times to compare
> > results.
> > > >   The
> > > > > issue is I am not  even sure if warming up is  useful for such
> >  regular
> > > > >   updates.
> > > > >
> > > > >
> > > >  >
> > > > > On Fri, Feb 4, 2011  at 5:16 PM, Otis   Gospodnetic <
> > > > otis_gospodnetic@yahoo.com
> >  > >  > >  wrote:
> > > > >
> > > >  > > Salman,
> > > > >  >
> > > > > >  I only skimmed your email, but wanted  to say that  this part
> >  sounds a
> > > > little
> > > > > >  suspicious:
> > > > >  >
> > > > > >  >  Our warm up script currently  executes  all distinct
 queries in
> > our
> > > >  logs
> > > > > >  > having  count > 5. It was run  yesterday (with all
the   indexing
> > > >  update
> > > > > >  every
> > > > > >
> > > > > > It sounds   like this will make  warmup take a looooong time,
> > assuming
> >  > >  you
> > > > > > have
> > > > > >  more than a  handful distinct  queries in your logs.
> > > >  > >
> > > > > > Otis
> > > > > >   ----
> > > > > >  Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > > > >  > Lucene ecosystem  search  :: http://search-lucene.com/
> > > > > >
> > > >  >  >
> > > > > >
> > > > > > -----  Original  Message  ----
> > > > > > > From: Salman  Akram <salman.akram@northbaysolutions.net>
> >  > >  > >  > To: solr-user@lucene.apache.org; te@statsbiblioteket.dk
> > >  > >  > >  Sent: Tue, January 25, 2011 6:32:48 AM
> >  > > > > >  Subject: Re: Performance  optimization of  Proximity/Wildcard
> > searches
> > > > > > >
> >  > > > > > By warmed  index you  only mean warming the  SOLR cache
or OS
> > cache? As
> > > > I
> > > >  >  >   said
> > > > > > > our index is  updated every hour so I am  not sure how
much SOLR
> >   cache
> > > > > >  would
> > > > >  >  > be helpful but OS cache should still be  helpful, right?
> > >  >  > > >
> > > > > > > I  haven't  compared the results   with a proper script
but from
> > manual
> >  > > > >  testing
> > > >  > > > here  are  some of the observations.
> > > > > >   >
> > > > > > > 'Recent' queries which  are  in  cache of  course return
> > immediately
> > > > (only
> >  > > > > if
> > > > > >  >  they are  exactly same - even  if they took 3-4 mins
first
> > time).   I
> > > >  will
> > > > > > need
> > >  > > > > to test how  many recent  queries stay in   cache but
still this
> > would
> > > >  work
> > >  > > > only
> > > > > > > for very common    queries.  User can run different queries
and I
> > want
> > >  > at
> > > > >  >  least
> > > > >  > > them to be at 'acceptable'  level  (5-10 secs) even if
  not very
> > fast.
> > > > > > >
> > > >  >  > > Our warm up script currently   executes all distinct
 queries in
> >  our
> > > > logs
> > > > > >  > having count > 5. It  was  run  yesterday (with all
the  indexing
> > > > update
> > > > > >   every
> > > > > > >  hour after that) and today  when  I  executed some of
the same
> > > >   queries
> > > > > > again
> > > >  > > >  their time seemed a little less  (around  15-20%), I
am   not
> > sure if
> > > > this
> > > > > >  means
> > > > > > >  anything. However,  still  their  time is not acceptable.
> > > > >  >  >
> > > > > > > What do you  think is the best way  to  compare  results?
First
> > run all
> > > >  the
> > > > > >   warm
> > > > > > > up  queries and then execute same randomly and    compare?
> > >  > > > >
> > > > > > > We are using Windows   server, would it make a  big difference
if
> >  we
> > >  > move
> > > >  > > to
> > > > > > >  Linux? Our load is not  high but some  queries are  really
> >  complex.
> > > > > > >
> > > >  > > >  Also I  was hoping to move to SSD in last after trying
 out  all
> > > >  software
> > > > > >   > options. Is that an  agreed fact that on large indexes
(which
> >  don't
> > > > fit
> > > > > >  in
> > >  > > > > RAM) proximity/wildcard/phrase queries (on   common  words)
would
> > be
> > > > slow
> > > >  > >  and
> > > >  > > > it can be only  improved by  cache warm up and better
 hardware?
> > > >  Otherwise
> > > > > >  with
> > > > > >  >  an  index of around 150GB such queries will take more
than  a
> > min?
> > > > > >  >
> > > > >  > > If that's the case I  know this question is very subjective
 but
> >  if a
> > > > > >   single
> > >  > > > > query takes 2 min on SAS 10K RPM what  would  its  approx
time be
> > on a
> > > >  good
> > > > >  > SSD
> > > > >  > > (everything  else  same)?
> > > > > > >
> > > > > >  >  Thanks!
> > > > > > >
> > > > > >  >
> > > > >  >  > On Tue, Jan 25,  2011 at  3:44 PM, Toke Eskildsen
> > > >  > > <te@statsbiblioteket.dk>wrote:
> >  > >  > >  >
> > > > > > >  >  On Tue, 2011-01-25 at  10:20 +0100, Salman Akram
  wrote:
> > > > > > > > >  Cache  warming is a  good option too but the
 index get
> > updated
> > > >   every
> > > > > > hour
> > > > > > >  >  so
> > > > >  > >  > > not sure  how much would that help.
> > > > > >  > >
> >  > > > > > >  What is the  time difference   between queries with
a warmed
> > index
> > > > and   a
> > > > > > >  > cold one? If  the warmed  index performs satisfactory,
 then
> >  one
> > > >  answer
> > > > > > is
> > > > > > > >  to  upgrade  your underlying  storage. As always
for  IO-caused
> > > >  > > performance
> > > > >  > > > problem  in   Lucene/Solr-land, SSD is the  answer.
> > > > > > > >
> > > > >  >  >  >
> > > > > > >
> > > > > >  >
> > > > >  > > --
> > > > > > >  Regards,
> > > > > >   >
> > > > > >  > Salman Akram
> > > > > > >
> > > > >   >
> > > > >
> > > > >
> > > >  >
> > > > > --
> > > > >  Regards,
> >  > > >
> > > > > Salman Akram
> > > >   >
> > > >
> > >
> > >
> > >
> > >  --
> > > Regards,
> > >
> > > Salman Akram
> >  >
> >
> 
> 
> 
> -- 
> Regards,
> 
> Salman Akram
> 

Mime
View raw message