lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Salman Akram <salman.ak...@northbaysolutions.net>
Subject Re: Performance optimization of Proximity/Wildcard searches
Date Fri, 04 Feb 2011 20:38:55 GMT
Well I assume many people out there would have indexes larger than 100GB and
I don't think so normally you will have more RAM than 32GB or 64!

As I mentioned the queries are mostly phrase, proximity, wildcard and
combination of these.

What exactly do you mean by distribution of documents? On this index our
documents are not more than few hundred KB's on average (file system size)
and there are around 14 million documents. 80% of the index size is taken up
by position file. I am not sure if this is what you asked?

On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

> Hi,
>
>
> > Sharding is an  option too but that too comes with limitations so want to
> > keep that as a last  resort but I think there must be other things coz
> 150GB
> > is not too big for  one drive/server with 32GB Ram.
>
> Hmm.... what makes you think 32 GB is enough for your 150 GB index?
> It depends on queries and distribution of matching documents, for example.
> What's yours like?
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
> > From: Salman Akram <salman.akram@northbaysolutions.net>
> > To: solr-user@lucene.apache.org
> > Sent: Tue, January 25, 2011 4:20:34 AM
> > Subject: Performance optimization of Proximity/Wildcard searches
> >
> > Hi,
> >
> > I am facing performance issues in three types of queries (and  their
> > combination). Some of the queries take more than 2-3 mins. Index size  is
> > around 150GB.
> >
> >
> >    - Wildcard
> >    -  Proximity
> >    - Phrases (with common words)
> >
> > I know CommonGrams and  Stop words are a good way to resolve such issues
> but
> > they don't fulfill our  functional requirements (Common Grams seem to
> have
> > issues with phrase  proximity, stop words have issues with exact match
> etc).
> >
> > Sharding is an  option too but that too comes with limitations so want to
> > keep that as a last  resort but I think there must be other things coz
> 150GB
> > is not too big for  one drive/server with 32GB Ram.
> >
> > Cache warming is a good option too but  the index get updated every hour
> so
> > not sure how much would that  help.
> >
> > What are the other main tips that can help in performance  optimization
> of
> > the above queries?
> >
> > Thanks
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
>



-- 
Regards,

Salman Akram

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message