lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.
Date Wed, 10 May 2006 04:19:36 GMT
Yueyu Lin,

Sorry, I don't follow this part:
"To resolve the problem, first I try to modify the codes and rebuild another
Lucene jar.
That's a bad idea, I didn't want to maintain my custom Lucene package."

Are you saying you _did_ make the code changes and _did_ run your application with a modified
Jar?
If so, did you measure the performance difference?  What was the difference?

If you have not modified the code and have not run your application with the modified Jar,
then you should try that first and see if your change makes any substantial difference.

We're all very interested in improving Lucene's performance, so please let us know what you
find.

Thanks,
Otis

----- Original Message ----
From: yueyu lin <popeyelin@gmail.com>
To: java-dev@lucene.apache.org; rengels@ix.netcom.com
Sent: Tuesday, May 9, 2006 10:01:55 PM
Subject: Re: Multiple threads searching in Lucene and the synchronized issue. -- solution
attached.

o, I think I didn't express it clearly.
First, I only have one IndexSearcher and multiple threads will share it.
Then I found the performance is not so good like I expect in a dual CPUs
machine.
So I forced the JVM to print thread dump and I found the threads are waiting
here.

After that, I traced the Lucene in Runtime using IDE's debug mode. I found
the codes.

To resolve the problem, first I try to modify the codes and rebuild another
Lucene jar.
That's a bad idea, I didn't want to maintain my custom Lucene package.

So I tried to use 2 IndexSearchers and expect to reduce the possiblity to
wait. In this test
I found the wait behavior almost disappeared.

Two lines codes seem small, but it's still a problem in a busy system. Here
what I emphasize is
they are indeed a problem in our system although only 10% performance or so.

That is to say, if you are using the original Lucene jar, we need 11 ms to
finish a query, the modified
Lucene jar will finish a query in 10 ms. But that's not true for all
conditioins. If a query will cost 100ms, maybe
the new version will only cost 99ms.

On 5/10/06, Robert Engels <rengels@ix.netcom.com> wrote:
>
> I think your basic problem is that you are using multiple IndexSearchers?
> And creating new instances during runtime? If so, you will be reading the
> index information far too often. This is not a good configuration.
>
> -----Original Message-----
> From: yueyu lin [mailto:popeyelin@gmail.com]
> Sent: Tuesday, May 09, 2006 8:46 PM
> To: java-dev@lucene.apache.org; Otis Gospodnetic
> Subject: Re: Multiple threads searching in Lucene and the synchronized
> issue. -- solution attached.
>
>
> Oh,please believe in me that I've forced the JVM to print the thread dump.
> It waited here indeed.
> I'll try to post the patch to JIRA.
> I don't want to modify these codes by myself because that will break the
> Lucene codes. So I wish you can do me the favor to check these codes and
> make it availabe in the next release.
> On 5/9/06, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> >
> > Yueyu Lin,
> >
> > From what I can tell from a quick look at the method, that method need
> to
> > remain synchronized, so multiple threads don't accidentally re-read that
> > 'indexTerms' (Term[] type).  Even though the method is synchronized, it
> > looks like only the first invocation would enter that try/catch/finally
> > block where term reading happens.  Subsequent calls to this method
> should
> > exist quickly, because indexTerms != null.
> >
> > Are you sure this is causing the bottleneck for you?
> > I believe the proper way to figure that out is to kill the JVM with a
> > SIGnal that causes the JVM to dump thread information.  That would tell
> you
> > where the code is blocking.
> >
> > Also, if you have concrete suggestions for code changes, please post
> them
> > to JIRA as diffs/patches.
> >
> > Otis
> >
> >
> > ----- Original Message ----
> > From: yueyu lin <popeyelin@gmail.com>
> > To: java-dev@lucene.apache.org
> > Sent: Tuesday, May 9, 2006 3:53:55 AM
> > Subject: Re: Multiple threads searching in Lucene and the synchronized
> > issue. -- solution attached.
> >
> > Please trace the codes into the Lucene when searching.
> > Here is a table about how invokations are called.
> > The trace log:   *Steps*
> > *ClassName*
> > *Functions*
> > *Description*
> >   1.  org.apache.lucene.search.Searcher  public final Hits search(Query
> > query)  It will call another search function.   2.
> > org.apache.lucene.search.Searcher  public Hits search(Query query,
> Filter
> > filter)  Only one line code. It will new a Hits.
> > return new Hits(this, query, filter);   3.
> > org.apache.lucene.search.Hits Hits(Searcher s, Query q, Filter f)
> > Next, we will trace into the constructor to see what stuffs will be
> > done.  4.
> > org.apache.lucene.search.Hits  Hits(Searcher s, Query q, Filter f)
> > line 41 : weight = q.weight(s)  This call will rewrite the Query if
> > necessary, let us to see what will happen then.
> >
> >
> >   5.  org.apache.lucene.search.Query  public Weight weight(Searcher
> > searcher)
> > line 92: Query query = searcher.rewrite(this);  This call will begin to
> > rewrite the Query.   6.  *org.apache.lucene.search.IndexSearcher*
>   public
> > Query rewrite(Query original)  NOTE: we only have one IndexSearcher
> which
> > has one IndexReader. If there is any functioins that are synchronized,
> the
> > query process will be queued.   7.
> > org.apache.lucene.search.BooleanQuery public Query rewrite(IndexReader
> > reader)
> > line 396: Query query = c.getQuery().rewrite(reader);  Here,
> BooleanQuery
> > will get its subqueries and call their rewrite function. The function
> will
> > require to pass a parameter: *IndexReader* that we only have one
> instance.
> > From the codes we will notice *TermQuery* will not be rewrote and *
> > PrefixQuery* will be rewrote to several *TermQuery*s. So we ignore the *
> > TermQuery* and look into the *PrefixQuery*.   8.
> > org.apache.lucene.search.PrefixQuery  public Query rewrite(IndexReader
> > reader)
> > line 41: TermEnum enumerator = reader.terms(prefix);  Let's see what
> will
> > happen then.   9.  org.apache.lucene.index.SegmentReader  public
> TermEnum
> > terms(Term t)
> > line 277: return tis.terms(t);  SegmentReader is in fact an
> IndexReader's
> > implementation.   10.  org.apache.lucene.index.TermInfosReader  public
> > SegmentTermEnum terms(Term term)
> > line 211:get(term);
> >
> >   11.  org.apache.lucene.index.TermInfosReader  TermInfo get(Term term)
> > line 136:ensureIndexIsRead();  We finally find it!   12.
> > org.apache.lucene.index.TermInfosReader  private synchronized void
> > ensureIndexIsRead()  Let's analyze the function and to see why it's
> > synchronized and how to improve it.
> >
> > On 5/9/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> > >
> > >
> > > :   We found if we were using 2 IndexSearcher, we would get 10%
> > > performance
> > > : benefit.
> > > :   But if we increased the number of IndexSearcher from 2, the
> > > performance
> > > : improvement became slight even worse.
> > >
> > > Why use more then 2 IndexSearchers?
> > >
> > > Typically 1 is all you need, except for when you want to open and
> "warm
> > > up" a new Searcher because you know your index has changed on disk and
> > > you're ready for those changes to be visible.
> > >
> > > (I'm not arguing against your change -- concurrancy isn't my forte so
> i
> > > have no opinion on wether your suggesting is good or not, i'm just
> > > questioning the goal)
> > >
> > > Acctually .. i don't know a lot about the internals of IndexSearcher
> and
> > > TermInfosReader, but according to your description of the problem...
> > >
> > > :   The class org.apache.lucene.index.TermInfosReader , as you know,
> > every
> > > : IndexSearcher will have one TermInfosReader. Every query, one method
> > in
> > > the
> > > : class must be called:
> > > : private synchronized void ensureIndexIsRead() throws IOException .
> > > Notice
> > >
> > > If the method isn't static, then how can two differnet instances of
> > > IndexSearcher, each with their own TermInfosReader, block one another?
> > >
> > >
> > >
> > >
> > > -Hoss
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > --
> > Yueyu Lin
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
>
> --
> --
> Yueyu Lin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


--
--
Yueyu Lin




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message