lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yueyu lin" <popeye...@gmail.com>
Subject Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.
Date Wed, 10 May 2006 02:01:55 GMT
o, I think I didn't express it clearly.
First, I only have one IndexSearcher and multiple threads will share it.
Then I found the performance is not so good like I expect in a dual CPUs
machine.
So I forced the JVM to print thread dump and I found the threads are waiting
here.

After that, I traced the Lucene in Runtime using IDE's debug mode. I found
the codes.

To resolve the problem, first I try to modify the codes and rebuild another
Lucene jar.
That's a bad idea, I didn't want to maintain my custom Lucene package.

So I tried to use 2 IndexSearchers and expect to reduce the possiblity to
wait. In this test
I found the wait behavior almost disappeared.

Two lines codes seem small, but it's still a problem in a busy system. Here
what I emphasize is
they are indeed a problem in our system although only 10% performance or so.

That is to say, if you are using the original Lucene jar, we need 11 ms to
finish a query, the modified
Lucene jar will finish a query in 10 ms. But that's not true for all
conditioins. If a query will cost 100ms, maybe
the new version will only cost 99ms.

On 5/10/06, Robert Engels <rengels@ix.netcom.com> wrote:
>
> I think your basic problem is that you are using multiple IndexSearchers?
> And creating new instances during runtime? If so, you will be reading the
> index information far too often. This is not a good configuration.
>
> -----Original Message-----
> From: yueyu lin [mailto:popeyelin@gmail.com]
> Sent: Tuesday, May 09, 2006 8:46 PM
> To: java-dev@lucene.apache.org; Otis Gospodnetic
> Subject: Re: Multiple threads searching in Lucene and the synchronized
> issue. -- solution attached.
>
>
> Oh,please believe in me that I've forced the JVM to print the thread dump.
> It waited here indeed.
> I'll try to post the patch to JIRA.
> I don't want to modify these codes by myself because that will break the
> Lucene codes. So I wish you can do me the favor to check these codes and
> make it availabe in the next release.
> On 5/9/06, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> >
> > Yueyu Lin,
> >
> > From what I can tell from a quick look at the method, that method need
> to
> > remain synchronized, so multiple threads don't accidentally re-read that
> > 'indexTerms' (Term[] type).  Even though the method is synchronized, it
> > looks like only the first invocation would enter that try/catch/finally
> > block where term reading happens.  Subsequent calls to this method
> should
> > exist quickly, because indexTerms != null.
> >
> > Are you sure this is causing the bottleneck for you?
> > I believe the proper way to figure that out is to kill the JVM with a
> > SIGnal that causes the JVM to dump thread information.  That would tell
> you
> > where the code is blocking.
> >
> > Also, if you have concrete suggestions for code changes, please post
> them
> > to JIRA as diffs/patches.
> >
> > Otis
> >
> >
> > ----- Original Message ----
> > From: yueyu lin <popeyelin@gmail.com>
> > To: java-dev@lucene.apache.org
> > Sent: Tuesday, May 9, 2006 3:53:55 AM
> > Subject: Re: Multiple threads searching in Lucene and the synchronized
> > issue. -- solution attached.
> >
> > Please trace the codes into the Lucene when searching.
> > Here is a table about how invokations are called.
> > The trace log:   *Steps*
> > *ClassName*
> > *Functions*
> > *Description*
> >   1.  org.apache.lucene.search.Searcher  public final Hits search(Query
> > query)  It will call another search function.   2.
> > org.apache.lucene.search.Searcher  public Hits search(Query query,
> Filter
> > filter)  Only one line code. It will new a Hits.
> > return new Hits(this, query, filter);   3.
> > org.apache.lucene.search.Hits Hits(Searcher s, Query q, Filter f)
> > Next, we will trace into the constructor to see what stuffs will be
> > done.  4.
> > org.apache.lucene.search.Hits  Hits(Searcher s, Query q, Filter f)
> > line 41 : weight = q.weight(s)  This call will rewrite the Query if
> > necessary, let us to see what will happen then.
> >
> >
> >   5.  org.apache.lucene.search.Query  public Weight weight(Searcher
> > searcher)
> > line 92: Query query = searcher.rewrite(this);  This call will begin to
> > rewrite the Query.   6.  *org.apache.lucene.search.IndexSearcher*
>   public
> > Query rewrite(Query original)  NOTE: we only have one IndexSearcher
> which
> > has one IndexReader. If there is any functioins that are synchronized,
> the
> > query process will be queued.   7.
> > org.apache.lucene.search.BooleanQuery public Query rewrite(IndexReader
> > reader)
> > line 396: Query query = c.getQuery().rewrite(reader);  Here,
> BooleanQuery
> > will get its subqueries and call their rewrite function. The function
> will
> > require to pass a parameter: *IndexReader* that we only have one
> instance.
> > From the codes we will notice *TermQuery* will not be rewrote and *
> > PrefixQuery* will be rewrote to several *TermQuery*s. So we ignore the *
> > TermQuery* and look into the *PrefixQuery*.   8.
> > org.apache.lucene.search.PrefixQuery  public Query rewrite(IndexReader
> > reader)
> > line 41: TermEnum enumerator = reader.terms(prefix);  Let's see what
> will
> > happen then.   9.  org.apache.lucene.index.SegmentReader  public
> TermEnum
> > terms(Term t)
> > line 277: return tis.terms(t);  SegmentReader is in fact an
> IndexReader's
> > implementation.   10.  org.apache.lucene.index.TermInfosReader  public
> > SegmentTermEnum terms(Term term)
> > line 211:get(term);
> >
> >   11.  org.apache.lucene.index.TermInfosReader  TermInfo get(Term term)
> > line 136:ensureIndexIsRead();  We finally find it!   12.
> > org.apache.lucene.index.TermInfosReader  private synchronized void
> > ensureIndexIsRead()  Let's analyze the function and to see why it's
> > synchronized and how to improve it.
> >
> > On 5/9/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> > >
> > >
> > > :   We found if we were using 2 IndexSearcher, we would get 10%
> > > performance
> > > : benefit.
> > > :   But if we increased the number of IndexSearcher from 2, the
> > > performance
> > > : improvement became slight even worse.
> > >
> > > Why use more then 2 IndexSearchers?
> > >
> > > Typically 1 is all you need, except for when you want to open and
> "warm
> > > up" a new Searcher because you know your index has changed on disk and
> > > you're ready for those changes to be visible.
> > >
> > > (I'm not arguing against your change -- concurrancy isn't my forte so
> i
> > > have no opinion on wether your suggesting is good or not, i'm just
> > > questioning the goal)
> > >
> > > Acctually .. i don't know a lot about the internals of IndexSearcher
> and
> > > TermInfosReader, but according to your description of the problem...
> > >
> > > :   The class org.apache.lucene.index.TermInfosReader , as you know,
> > every
> > > : IndexSearcher will have one TermInfosReader. Every query, one method
> > in
> > > the
> > > : class must be called:
> > > : private synchronized void ensureIndexIsRead() throws IOException .
> > > Notice
> > >
> > > If the method isn't static, then how can two differnet instances of
> > > IndexSearcher, each with their own TermInfosReader, block one another?
> > >
> > >
> > >
> > >
> > > -Hoss
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > --
> > Yueyu Lin
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
>
> --
> --
> Yueyu Lin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


--
--
Yueyu Lin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message