lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yueyu lin" <popeye...@gmail.com>
Subject Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.
Date Wed, 10 May 2006 01:45:47 GMT
Oh,please believe in me that I've forced the JVM to print the thread dump.
It waited here indeed.
I'll try to post the patch to JIRA.
I don't want to modify these codes by myself because that will break the
Lucene codes. So I wish you can do me the favor to check these codes and
make it availabe in the next release.
On 5/9/06, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
>
> Yueyu Lin,
>
> From what I can tell from a quick look at the method, that method need to
> remain synchronized, so multiple threads don't accidentally re-read that
> 'indexTerms' (Term[] type).  Even though the method is synchronized, it
> looks like only the first invocation would enter that try/catch/finally
> block where term reading happens.  Subsequent calls to this method should
> exist quickly, because indexTerms != null.
>
> Are you sure this is causing the bottleneck for you?
> I believe the proper way to figure that out is to kill the JVM with a
> SIGnal that causes the JVM to dump thread information.  That would tell you
> where the code is blocking.
>
> Also, if you have concrete suggestions for code changes, please post them
> to JIRA as diffs/patches.
>
> Otis
>
>
> ----- Original Message ----
> From: yueyu lin <popeyelin@gmail.com>
> To: java-dev@lucene.apache.org
> Sent: Tuesday, May 9, 2006 3:53:55 AM
> Subject: Re: Multiple threads searching in Lucene and the synchronized
> issue. -- solution attached.
>
> Please trace the codes into the Lucene when searching.
> Here is a table about how invokations are called.
> The trace log:   *Steps*
> *ClassName*
> *Functions*
> *Description*
>   1.  org.apache.lucene.search.Searcher  public final Hits search(Query
> query)  It will call another search function.   2.
> org.apache.lucene.search.Searcher  public Hits search(Query query, Filter
> filter)  Only one line code. It will new a Hits.
> return new Hits(this, query, filter);   3.
> org.apache.lucene.search.Hits Hits(Searcher s, Query q, Filter f)
> Next, we will trace into the constructor to see what stuffs will be
> done.  4.
> org.apache.lucene.search.Hits  Hits(Searcher s, Query q, Filter f)
> line 41 : weight = q.weight(s)  This call will rewrite the Query if
> necessary, let us to see what will happen then.
>
>
>   5.  org.apache.lucene.search.Query  public Weight weight(Searcher
> searcher)
> line 92: Query query = searcher.rewrite(this);  This call will begin to
> rewrite the Query.   6.  *org.apache.lucene.search.IndexSearcher*  public
> Query rewrite(Query original)  NOTE: we only have one IndexSearcher which
> has one IndexReader. If there is any functioins that are synchronized, the
> query process will be queued.   7.
> org.apache.lucene.search.BooleanQuery public Query rewrite(IndexReader
> reader)
> line 396: Query query = c.getQuery().rewrite(reader);  Here, BooleanQuery
> will get its subqueries and call their rewrite function. The function will
> require to pass a parameter: *IndexReader* that we only have one instance.
> From the codes we will notice *TermQuery* will not be rewrote and *
> PrefixQuery* will be rewrote to several *TermQuery*s. So we ignore the *
> TermQuery* and look into the *PrefixQuery*.   8.
> org.apache.lucene.search.PrefixQuery  public Query rewrite(IndexReader
> reader)
> line 41: TermEnum enumerator = reader.terms(prefix);  Let's see what will
> happen then.   9.  org.apache.lucene.index.SegmentReader  public TermEnum
> terms(Term t)
> line 277: return tis.terms(t);  SegmentReader is in fact an IndexReader's
> implementation.   10.  org.apache.lucene.index.TermInfosReader  public
> SegmentTermEnum terms(Term term)
> line 211:get(term);
>
>   11.  org.apache.lucene.index.TermInfosReader  TermInfo get(Term term)
> line 136:ensureIndexIsRead();  We finally find it!   12.
> org.apache.lucene.index.TermInfosReader  private synchronized void
> ensureIndexIsRead()  Let's analyze the function and to see why it's
> synchronized and how to improve it.
>
> On 5/9/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> >
> >
> > :   We found if we were using 2 IndexSearcher, we would get 10%
> > performance
> > : benefit.
> > :   But if we increased the number of IndexSearcher from 2, the
> > performance
> > : improvement became slight even worse.
> >
> > Why use more then 2 IndexSearchers?
> >
> > Typically 1 is all you need, except for when you want to open and "warm
> > up" a new Searcher because you know your index has changed on disk and
> > you're ready for those changes to be visible.
> >
> > (I'm not arguing against your change -- concurrancy isn't my forte so i
> > have no opinion on wether your suggesting is good or not, i'm just
> > questioning the goal)
> >
> > Acctually .. i don't know a lot about the internals of IndexSearcher and
> > TermInfosReader, but according to your description of the problem...
> >
> > :   The class org.apache.lucene.index.TermInfosReader , as you know,
> every
> > : IndexSearcher will have one TermInfosReader. Every query, one method
> in
> > the
> > : class must be called:
> > : private synchronized void ensureIndexIsRead() throws IOException .
> > Notice
> >
> > If the method isn't static, then how can two differnet instances of
> > IndexSearcher, each with their own TermInfosReader, block one another?
> >
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
>
> --
> --
> Yueyu Lin
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


--
--
Yueyu Lin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message