lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@apache.org>
Subject Re: PyLucene, multiprocessing, high thread count
Date Fri, 15 Jul 2011 08:10:23 GMT

 Hi,

On Jul 15, 2011, at 2:05, "h.g. g.h." <highigh.22@gmail.com> wrote:

>    I am trying to run parallel lucene queries using PyLucene and Python's
> Multiprocessing package. I run it on a large server machine with thousands
> of cores. The problem that I am running into is that it is a shared machine
> so I have to request a fixed number of CPUs per job (< 100). Apparently,
> each process that runs Lucene queries, generates a very large number of
> threads and so even with 50 processes (if I request 50 cpus), the number of
> threads becomes > 500. I need to limit the thread-count so it doesn't
> overshoot the allotted CPU resources.

PyLucene embeds a Java VM. Thus, with each subprocess, a new JVM is created with all its threads.
This can get insane pretty quickly.
Instead, you should run one process and use Java threads to parallelize your queries. Contrary
to Python, Java threads run fully concurrently and PyLucene releases the Python GIL whenever
it calls into the JVM. To see an example on how to use Java threads from PyLucene, please
refer to the test file using threads in the test directory: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_2/test/test_PyLuceneThread.py

>    Is there a way to limit the number of threads that Lucene/Pylucene is
> allowed to create?

Not that I know of, no.

Andi..

>    Thanks for your help and time.
> 
> Best Regards,
> hg

Mime
View raw message