lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "h.g. g.h." <highigh...@gmail.com>
Subject Re: PyLucene, multiprocessing, high thread count
Date Mon, 18 Jul 2011 19:11:25 GMT
On Sat, Jul 16, 2011 at 5:41 AM, Andi Vajda <vajda@apache.org> wrote:

>
> On Jul 16, 2011, at 2:53, "h.g. g.h." <highigh.22@gmail.com> wrote:
>
> > Thanks a lot for your replies, Andi and Christian. This is exactly what I
> > saw... about 11 threads per python process that was making lucene
> queries...
> >
> > I have a follow-up question now (may be even naive):
> > Andi, about your suggestion of using multithreading and that PyLucene
> will
> > release GIL when it calls into JVM... So, when it does that, another
> python
> > thread will acquire GIL, and then call into JVM, and then another
> thread...
> > so on and so forth. Will this again not lead to too many Java threads
> > running in parallel?? Did I misunderstand what you were suggesting?
>
> It depends how long the queries are running for. It's true that only one
> Python thread can start queries at a time because it holds the GIL. If these
> queries are difficult enough and run for a long time, decent concurrency can
> still be achieved.
>
> > The staff here who run parallel/multithreaded java code use the cmd
>  "java
> > -XX:ParallelGCThreads=16 JavaApp". I tried to pass the same argument as
> > "initVM(vmargs="-XX:ParallelGCThreads=4", but it doesn't obey. Am I
> missing
> > something here, or misusing it, may be??
>
> What are your assumptions here ?
>

I think my assumption here is that -XX:ParallelGCThreads is a keyword
argument that vmarg can convey while initializing the JVM...



> Maybe Python isn't adapted to the constraints you must run in ? Why not use
> Java directly ?
>
>

We're not using Java since the rest of the application is already all coded
up in Python... We decided to venture into Lucene only very recently and
thought of using the PyLucene interface...
The only constraint really is that the application should run in the
resources allocated to it, and not based on the totality of all resources
available on the shared machine...



> Andi..
>
> >
> > Himanshu
> >
> >
> > On Fri, Jul 15, 2011 at 9:47 AM, Christian Heimes <lists@cheimes.de>
> wrote:
> >
> >> Am 15.07.2011 10:10, schrieb Andi Vajda:
> >>> PyLucene embeds a Java VM. Thus, with each subprocess, a new JVM is
> >> created with all its threads. This can get insane pretty quickly.
> >>
> >> The Java VM starts a lot of threads. On my Linux box eleven threads
> >> additional threads are running after initVM() has been called.
> >>
> >>>>> import lucene, os, psutil
> >>>>> psutil.Process(os.getpid()).get_num_threads()
> >> 1
> >>>>> lucene.initVM()
> >> <jcc.JCCEnv object at 0x7f23a66f31e0>
> >>>>> psutil.Process(os.getpid()).get_num_threads()
> >> 12
> >>
> >> Christian
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message