lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Kan <solrexp...@gmail.com>
Subject Re: Profiling Solr Lucene for query
Date Mon, 09 Sep 2013 20:29:57 GMT
Hi Manuel,

The frontend solr instance is the one that does not have its own index and
is doing merging of the results. Is this the case? If yes, are all 36
shards always queried?

Dmitry


On Mon, Sep 9, 2013 at 10:11 PM, Manuel Le Normand <
manuel.lenormand@gmail.com> wrote:

> Hi Dmitry,
>
> I have solr 4.3 and every query is distributed and merged back for ranking
> purpose.
>
> What do you mean by frontend solr?
>
>
> On Mon, Sep 9, 2013 at 2:12 PM, Dmitry Kan <solrexpert@gmail.com> wrote:
>
> > are you querying your shards via a frontend solr? We have noticed, that
> > querying becomes much faster if results merging can be avoided.
> >
> > Dmitry
> >
> >
> > On Sun, Sep 8, 2013 at 6:56 PM, Manuel Le Normand <
> > manuel.lenormand@gmail.com> wrote:
> >
> > > Hello all
> > > Looking on the 10% slowest queries, I get very bad performances (~60
> sec
> > > per query).
> > > These queries have lots of conditions on my main field (more than a
> > > hundred), including phrase queries and rows=1000. I do return only id's
> > > though.
> > > I can quite firmly say that this bad performance is due to slow storage
> > > issue (that are beyond my control for now). Despite this I want to
> > improve
> > > my performances.
> > >
> > > As tought in school, I started profiling these queries and the data of
> ~1
> > > minute profile is located here:
> > > http://picpaste.com/pics/IMG_20130908_132441-ZyrfXeTY.1378637843.jpg
> > >
> > > Main observation: most of the time I do wait for readVInt, who's
> > stacktrace
> > > (2 out of 2 thread dumps) is:
> > >
> > > catalina-exec-3870 - Thread t@6615
> > >  java.lang.Thread.State: RUNNABLE
> > >  at org.apadhe.lucene.store.DataInput.readVInt(DataInput.java:108)
> > >  at
> > >
> > >
> >
> org.apaChe.lucene.codeosAockTreeIermsReade$FieldReader$SegmentTermsEnumFrame.loadBlock(BlockTreeTermsReader.java:
> > > 2357)
> > >  at
> > >
> > >
> >
> ora.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1745)
> > >  at org.apadhe.lucene.index.TermContext.build(TermContext.java:95)
> > >  at
> > >
> > >
> >
> org.apache.lucene.search.PhraseQuery$PhraseWeight.<init>(PhraseQuery.java:221)
> > >  at
> > org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:326)
> > >  at
> > >
> > >
> >
> org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
> > >  at
> > >
> org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> > >  at
> > >
> > >
> >
> org.apache.lucene.searth.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
> > >  at
> > >
> oro.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> > >  at
> > >
> > >
> >
> org.apache.lucene.searth.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
> > >  at
> > >
> org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> > >  at
> > >
> > >
> >
> org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:675)
> > >  at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
> > >
> > >
> > > So I do actually wait for IO as expected, but I might be too many time
> > page
> > > faulting while looking for the TermBlocks (tim file), ie locating the
> > term.
> > > As I reindex now, would it be useful lowering down the termInterval
> > > (default to 128)? As the FST (tip files) are that small (few 10-100 MB)
> > so
> > > there are no memory contentions, could I lower down this param to 8 for
> > > example? The benefit from lowering down the term interval would be to
> > > obligate the FST to get on memory (JVM - thanks to the
> > NRTCachingDirectory)
> > > as I do not control the term dictionary file (OS caching, loads an
> > average
> > > of 6% of it).
> > >
> > >
> > > General configs:
> > > solr 4.3
> > > 36 shards, each has few million docs
> > > These 36 servers (each server has 2 replicas) are running virtual, 16GB
> > > memory each (4GB for JVM, 12GB remain for the OS caching),  consuming
> > 260GB
> > > of disk mounted for the index files.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message