lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glen Newton" <glen.new...@gmail.com>
Subject Re: Binding lucene instance/threads to a particular processor(or core)
Date Wed, 23 Apr 2008 19:17:22 GMT
Hi Anshum,

2008/4/23 Anshum <anshumg@gmail.com>:
> Hi Glen,
>
>  As far as stats for index/search are concerned, here they are:
>  * Yes, it is a web based application
>  * I am currently facing issues when the number of concurrent searches goes
>  high. The search is not able to handle over 2.5 searches per second.
What is the OS, hardware etc you are using?

I am assuming you are only opening one instance of IndexSearcher for
all of your threads/queries (I would do it in the servlet init()
method) to use, as indicated in
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/IndexSearcher.html

>  * JVM command line parameters: -server mode; Max heap size 2GB (As it is a
>  32 bit machine/OS and that is the limit in the case)
What JVM/version?

Have you tried running a profiler to see what is happening or even
jconsole (add  -Dcom.sun.management.jmxremote to your command line
parameters and (in Linux at least) run jconsole after the JVM has
started.

A general suggestion:
 -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC   -Xms2000m
-XX:+UseConcMarkSweepGC

Although the garbage collection depends on your configuration.

>  * Performance expectations :
>   * Mean search time :1-2 seconds (though that is a lot considering that I
>  have tried other engines, though with smaller indexes and they seem to have
>  sub 0.01s searching time)

Yes, I am also surprised by this poor performance.

>  * No, I do not store any fields.
>  * Yes, I am using multiple machines ( more than a couple ).serving the same
>  index; a little more than just a vanilla load balancing mechanism; sending
>  similar requests to the same server to better OS level caching
>  * We do not index and search on the same machine, so in that case, I would
>  not think that changing the priority of these threads would matter. In other
>  words, there is nothing but search that happens on the search server. If
>  that is what you meant when you wrote about increasing the thead priority.

Your servlet environment has many threads that have to do a lot of
things before they get around to the part of your code that does the
searching. Once they get to the search part, you want them to have
higher priority than the threads that are handling requests much
further back in the servlet pipeline. So set the thread priority
higher when it gets to your code. Of course you would need to test to
see the impact on performance in your particular case.

>
>  I had already started trying the TPE and  guess it would provide a lot of
>  improvement but 'm looking for other ways as well ! :)

-glen


>  --
>  Anshum
>
>
>
>  On Tue, Apr 22, 2008 at 11:19 PM, Glen Newton <glen.newton@gmail.com> wrote:
>
>  > So even if you only have one index, this is the way to go to manage
>  > this kind of problem.
>  >
>  > Looking at the implementation and having used ThreadPoolExecutor (TPE)
>  > a lot, I would make the following suggestions for this class so as to
>  > better support this particular use case:
>  > Better access to the configuration of the TPE is needed: the ability
>  > to choose the number of threads (pools size and max pool size), the
>  > type and nature of the queue, etc. Also, the default behaviour of TPE
>  > is to throw an exception when a job is submitted and the queue is
>  > full: throwing an exception is expensive, especially when dozens or
>  > hundreds of searches are being rejected and many exceptions are
>  > occurring in a high load situation. Instead, being able to set the
>  > RejectedExecutionHandler on the TPE would allow for a graceful
>  > handling of rejected queries (feedback to user application, etc).
>  > Also, TPE allows for a custom ThreadFactory, which I use to produce
>  > threads with the highest priority to do the searching.
>  >
>  > Right now the implementation sets the #threads and queue size as a
>  > function of the number of Searchables, which is reasonable for this
>  > use case. But it would generalize better if these were a function of
>  > the number of cores on the machine, or some combination.
>  >
>  > That said, I would suggest having adding the ability to set the
>  > ThreadPoolExecutor completely, with a getter/setter. This would allow
>  > this class to be useful beyond the use case of multiple indexes,
>  > becoming more generalizable to a number of use cases including
>  > allowing it to support the use case of one (or more indexes) and a
>  > high work load of queries that need to be managed. It could use the
>  > same defaults if the TPE is not set externally (is null).
>  >
>  > -Glen
>  >
>  > 2008/4/22 Renaud Waldura <renaud.waldura@library.ucsf.edu>:
>  > > > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
>  > >  > number of threads and a limited queue size (use a bound
>  > BlockingQueue[3])
>  > >
>  > >  Yes, this is precisely how the ConcurrentMultiSearcher works.
>  > >  https://issues.apache.org/jira/browse/LUCENE-423
>  > >
>  > >
>  > >
>  > >
>  > >  -----Original Message-----
>  > >  From: Glen Newton [mailto:glen.newton@gmail.com]
>  > >  Sent: Tuesday, April 22, 2008 5:40 AM
>  > >  To: java-user@lucene.apache.org
>  > >  Subject: Re: Binding lucene instance/threads to a particular
>  > processor(or
>  > >  core)
>  > >
>  > >
>  > >
>  > > Anshun,
>  > >
>  > >  I think I am dealing with an index of similar scale: 6.4 million
>  > records, 83
>  > >  GB index (see [1] for more info)
>  > >
>  > >  I mistakenly thought from your original posting that you were
>  > interested in
>  > >  binding threads to processors for indexing, but it is sounding like you
>  > want
>  > >  to do this for searching. I am not sure if this would work well for
>  > >  searching, as there is a great deal more ephemeral state. But I am not
>  > sure.
>  > >
>  > >  With respect to handling concurrency for search, could you describe
>  > your
>  > >  environment better?
>  > >  - Is it an web-base application?
>  > >  - What sort of problems do you have now?
>  > >  - What are your java command line parameters (heap, etc.)
>  > >  - What are the performance expectations, i.e. average search < N1
>  > seconds;
>  > >  median search time < N2 seconds
>  > >  - Are you storing any fields and if so, do you need to store so many?
>  > >  - Do you have the choice of serving searches from multiple machines?
>  > >  i.e. load balance across >1 machine; We've found this to be one of the
>  > best
>  > >  solutions for scaling;
>  > >  - One thing that I do for both indexing and searching is that the
>  > threads
>  > >  that are doing these tasks I always shift their priority to MAX, so
>  > that
>  > >  they are run in preference to threads doing other things, like
>  > preparing
>  > >  Documents, etc.
>  > >
>  > >  If it is a web-type environment, one solution is to set-up a
>  > >  ThreadPoolExecutor[2] with a fixed number of threads and a limited
>  > queue
>  > >  size (use a bound BlockingQueue[3]) . You would have to experiment with
>  > the
>  > >  numbers to get the sweet-spot for your situation.
>  > >  I would suggest starting with 2 times the number processors (cores) and
>  > a
>  > >  queue of say 20. Requests are queued, but if the request cannot be
>  > queued,
>  > >  at least the application then knows that it is too busy and you can
>  > give the
>  > >  user a message "The system is too busy at this moment, please try again
>  > in a
>  > >  few seconds..." kind of thing. The advantage is also that when things
>  > are
>  > >  very busy, you have the accepted requests handled in a reasonable
>  > amount of
>  > >  time, and some users being told things are too busy, as opposed to all
>  > the
>  > >  requests going in and the system thrashing (and perhaps running-out of
>  > >  memory at the same time) and everyone's queries being very slow.
>  > >
>  > >  But I would suggest setting-up a test harness to emulate your
>  > production
>  > >  conditions to try these things out...
>  > >
>  > >  -Glen
>  > >
>  > >  [1]
>  > http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks
>  > >  .html
>  > >  [2]
>  > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx
>  > >  ecutor.html
>  > >  [3]
>  > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu
>  > >  e.html
>  > >
>  > >
>  > >  2008/4/22 Anshum <anshumg@gmail.com>:
>  > >  > The paper seems pretty good but I am still wondering if there was a
>  > >  > way to  achieve this through the command line parameters. I'm just
>  > >  > trying this to  optimize the code, if this works, would let all know
>  > >  > else would keep  everyone informed :)  Any other suggestions for
>  > >  > handling a concurrency of over 7 search requests  per second for an
>  > >  > index size of over 15Gigs containing over 13 million  records?
>  > >  >  Also, could someone help me with obtaining a 'index size' -
>  > >  > 'concurrency' -  'processor power' - 'memory' relationship formula
>  > (or
>  > >  something similar)?
>  > >  >
>  > >  >  --
>  > >  >  Anshum
>  > >  >
>  > >  >
>  > >  >
>  > >  >
>  > >  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <adb@teamware.com>
>  > >  wrote:
>  > >  >
>  > >  >  > That paper from 1997 is pretty old, but mirrors our experiences
in
>  > >  > those  > days. Then, we used Solaris processor sets to really improve
>  > >  > performance by  > binding one of our processes to a particular CPU
>  > >  > while leaving the other  > CPUs to manage the thread intensive work.
>  > >  >  >
>  > >  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
>  > >  >  >
>  > >  >  > The Solaris thread model in the late '90s was also a significant
>  > >  > factor in  > performance of multi-threaded programs.  The default
>  > >  > thread library in  > Solaris 8 implemented a MxN unbound thread model
>  > >  > (threads/LWPS).  In those  > days we found that it did not perform
>  > >  > well, so used the bound thread model  > (i.e. 1:1) where a Solaris
>  > >  > thread was bound permanently to an LWP.  That  > improved performance
>  > >  > a lot.  In Solaris 8, Sun had what they called the  > 'alternate'
>  > >  > thread library (T2) around 2000, which became the default  > library
>  > >  > in Solaris 9, and implemented a 1:1 model of Solaris threads to  >
>  > LWPs.
>  > >  That new library had dramatic performance improvements over the old.
>  > >  >  >
>  > >  >  > Some background info for Java and threading  >  >
>  > >  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  > >  >  >
>  > >  >  > Antony
>  > >  >  >
>  > >  >  >
>  > >  >  >
>  > >  >  > Glen Newton wrote:
>  > >  >  >
>  > >  >  > > I realised that not everyone on this list might be able to
>  > access
>  > >  > the  > > IEEE paper I pointed-out, so I will include the abstract
and
>  > >  > some  > > paragraphs from the paper which I have included below.
>  > >  >  > >
>  > >  >  > > Also of interest (and should be available to all): Fedorova
et
>  > al.
>  > >  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And
 > >
>  > >  > Implications For Operating System Design. Usenix 2005.
>  > >  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
>  > >  >  > > "Abstract: We investigated how operating system design should
be
>  > >  > > > adapted for multithreaded chip multiprocessors (CMT) - a new
 > >
>  > >  > generation of processors that exploit thread-level parallelism to
>  > mask
>  > >  > > > the memory latency in modern workloads. We  > > determined
that
>  > >  > the L2 cache is a critical shared resource on CMT and  > > that
an
>  > >  > insufficient amount of L2 cache can undermine the ability to  > >
>  > hide
>  > >  > memory latency on these processors. To use the L2 cache as  > >
>  > >  > efficiently as possible, we propose an L2-conscious scheduling  >
>
>  > >  > algorithm and quantify its performance potential. Using this
>  > algorithm
>  > >  > > > it is possible to reduce miss ratios in the L2 cache by 25-37%
>  > and
>  > >  > > > improve processor throughput by 27-45%."
>  > >  >  > >
>  > >  >  > >
>  > >  >  > > From Lundberg, L. 1997:
>  > >  >  > > Abstract: "The default scheduling algorithm in Solaris and
other
>  > >  > > > operating systems may result in frequent relocation of threads
at
>  > >  > > > run-time. Excessive thread relocation cause  > > poor
memory
>  > >  > performance. This can be avoided by binding threads to  > >
>  > >  > processors. However, binding threads to processors may result in an
>  >  >
>  > >  > > unbalanced load. By considering a previously obtained theoretical
>  >  >
>  > >  > > result and by evaluating a set of multithreaded Solaris  > >
>  > >  > programs using a multiprocessor with 8 processors, we are able to  >
>  > >
>  > >  > bound the maximum performance loss due to binding threads, The  >
>
>  > >  > theoretical result is also recapitulated. By evaluating another set
>  > of
>  > >  > > > multithreaded programs, we show that the gain of binding threads
>  > >  > to  > > processors may be substantial, particularly for programs
with
>  > >  > fine  > > grained parallelism."
>  > >  >  > >
>  > >  >  > > First paragraph: "The thread concept in Solaris [3][5] and
other
>  > >  > > > operating systems makes it possible to write multithreaded
>  > >  > programs,  > > which can be executed in parallel on a multiprocessor.
>  > >  > Previous  > > experience from real world programs [4] show that,
>  > using
>  > >  > the default  > > scheduling algorithm in Solaris, threads are
>  > >  > frequently relocated from  > > one processor  > > to another
at
>  > >  > run-time. After each such relocation, the code and data  > >
>  > >  > associated with the relocated thread is moved from the cache memory
>  > of
>  > >  > > > the 0113 processor to the cache of the new processor. This
>  > reduces
>  > >  > the  > > performance. Run-time relocation of threads to processors
>  > can
>  > >  > also  > > result in unpredictable response times. This is a problem
>  > in
>  > >  > systems  > > which operate in a real-time environment. In order
to
>  > >  > avoid poor  > > memory performance and unpredictable real-time
>  > >  > behaviour due to  > > frequent thread relocation, threads can
be
>  > bound
>  > >  > to processors using  > > the processor-bind directive [3] [5].
The
>  > >  > major problem with binding  > > threads is that one can end up
with
>  > an
>  > >  > unbalanced load, i.e. some  > > processors may be extremely busy
>  > >  > during some time periods while other  > > processors are idle."
>  > >  >  > >
>  > >  >  > > -Glen
>  > >  >  > >
>  > >  >  > > On 21/04/2008, Glen Newton <glen.newton@gmail.com>
wrote:
>  > >  >  > >
>  > >  >  > > > And this discussion on bound threads may also shed light
on
>  > things:
>  > >  >  > > >
>  > >  >  > > >
>  > >  >
>  > http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
>  > >  > 7-11/msg02801.html
>  > >  >  > > >
>  > >  >  > > >
>  > >  >  > > >  -Glen
>  > >  >  > > >
>  > >  >  > > >
>  > >  >  > > >  On 21/04/2008, Glen Newton <glen.newton@gmail.com>
wrote:
>  > >  >  > > >  > BInding threads to processors - in many situations
-
>  > >  > improves  > > >  >  throughput by reducing memory overhead.
When a
>  > >  > thread is running  > > > on a  > > >  >  core,
its state is local; if
>  > >  > it is timeshared-out and either 1)  > > >  >  swapped back
in on the
>  > >  > same core, it is likely that there will be  > > >  the  >
> >  >
>  > >  > core's L1 cache; or 2) onto another core, there will not be a  >
> >
>  > >  > cache  > > >  >  hit and the state will have to be fetched
from L2 or
>  > >  > main memory,  > > >  >  incurring a performance hit, esp
in the
>  > >  > latter. See Lundberg, L.
>  > >  >  > > > 1997.
>  > >  >  > > >  >  Evaluating the Performance Implications of Binding
Threads
>  > >  > to  > > >  >  Processors. 393.
>  > >  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  > >  >  > > >  >  for more info.
>  > >  >  > > >  >
>  > >  >  > > >  >  If you are using JVM on Solaris on SPARC, you
should take
>  > a
>  > >  > look  > > > at  > > >  >  the following links for
tuning (the Sun JVM
>  > >  > on Solaris SPARC has  > > > many  > > >  >  more
performance tuning
>  > >  > parameters available), including  > > > threading:
>  > >  >  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  > >  >  > > >  >  -
>  > >  >  > > >
>  > >  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  > >  >  > > >  >  -
>  > >  >  > > >
>  > >  >
>  > http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
>  > >  > 21107291  > > >  >  -
>  > >  > http://java.sun.com/javase/technologies/performance.jsp
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >  -Glen
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >  On 21/04/2008, Ulf Dittmer <udittmer@yahoo.com>
wrote:
>  > >  >  > > >  >  > This sounds odd. Why would restricting it
to a single  >
>  > >  > > >  >  >  core improve performance? The point of using
multiple  > >
>  > >  > >  >  >  cores (and multiple threads) is to improve performance
 > >
>  > >
>  > >  > >  >  isn't it? I'd leave thread scheduling decisions to the 
> > >
>  >  >
>  > >  > >  JVM. Plus, I don't think there is anything in Java to  > >
>  >  >
>  > >  > facilitate this (short of using JNI).
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  Are you talking about indexing or searching?
You may  >
>  > >  > > >  >  >  be able to use multiple parallel threads to improve
 > > >
>  > >  > >  >  indexing performance. I don't think Lucene uses  > >
>  >  >
>  > >  > multi-threading for searching; not unless you have  > > > 
>  >
>  > >  > multiple indices, anyway.
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  Ulf
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  --- Anshum <anshumg@gmail.com> wrote:
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  > Hi,
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >  > I have been trying to bind my lucene
instance (JVM -
>  > >  > > > >  >  >  > Sun Hotspot*) to a  > > >
 >  >  > particular core so
>  > >  > as to improve the performance. Is  > > >  >  >  >
there a way to do
>  > so
>  > >  > or  > > >  >  >  > is there support in lucene to explicitly
control
>  > >  > the  > > >  >  >  > thread - processor  > >
>  >  >  > linkup?
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >  > --
>  > >  >  > > >  >  >  > --
>  > >  >  > > >  >  >  > The facts expressed here belong to
everybody, the  >
>  > >
>  > >  > >  >  >  > opinions to me.
>  > >  >  > > >  >  >  > The distinction is yours to draw............
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >
>  > >  >
>  > ______________________________________________________________________
>  > >  > ______________  > > >  >  >  Be a better friend, newshound,
and  > >
>  > >
>  > >  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
>  > >  >  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >
>  > >  > ---------------------------------------------------------------------
>  > >  >  > > >  >  >  To unsubscribe, e-mail:
>  > >  > java-user-unsubscribe@lucene.apache.org
>  > >  >  > > >  >  >  For additional commands, e-mail:
>  > >  >  > > > java-user-help@lucene.apache.org  > > >  >
 >  > > >  >  >  >
>  > >
>  > >  > >  >  > > >  >  > > >  >  > > >
 > --  > > >  >  > > >  >  -  > > >
>  >  >
>  > >  > > > >  > > >  > > >  > > > --  >
> >  > > >  -  > > >  > > >  > >  >
>  > >
>  > >  > >  >
>  > >  > ---------------------------------------------------------------------
>  > >  >  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  > >  >  > For additional commands, e-mail: java-user-help@lucene.apache.org
>  > >  > >  >
>  > >  >
>  > >  >
>  > >  >  --
>  > >  >
>  > >  >
>  > >  > --
>  > >  >  The facts expressed here belong to everybody, the opinions to me.
>  > >  >  The distinction is yours to draw............
>  > >  >
>  > >
>  > >
>  > >
>  > >  --
>  > >
>  > >  -
>  > >
>  > >
>  > >
>  > >  ---------------------------------------------------------------------
>  > >  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  > >  For additional commands, e-mail: java-user-help@lucene.apache.org
>  > >
>  > >
>  >
>  >
>  >
>  > --
>  >
>  > -
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  > For additional commands, e-mail: java-user-help@lucene.apache.org
>  >
>  >
>
>
>  --
>  --
>  The facts expressed here belong to everybody, the opinions to me.
>  The distinction is yours to draw............
>



-- 

-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message