lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: file open handles?
Date Wed, 27 Jan 2010 06:42:24 GMT
Hi Jamie,

  How fast are you indexing (number of documents per second)?  We also ran
into this
when trying to perf test heavy query throughput while doing rapid indexing
under exactly
these conditions: call getReader() every time a search is executed (so that
it's "really
real time").

  The answer is that calling getReader() on every search request is really
not a supported
operation under heavy indexing load, currently.  You are advised to cache
the reader you
get from this call, and only refresh once per some interval of time
(determined by your need:
if you need real-time up to every 5 seconds, refresh every 5s.  If you need
every second,
refresh every second, etc).

  If you really want to have the functionality of being able to get a
completely fresh view
of the index on each search, look at the Zoie project ( ), an
Apache-licensed realtime search system built on top of Lucene which we
out of LinkedIn a couple of years ago.  Zoie allows for immediate reopen on
and handles enormously heavy indexing and query load (I recently demo'ed
zoie to
the guys at Twitter at a tech-talk, and showed off indexing about 1500
tweets per
second (it could have done more, but it was virtualized disk - EC2) while
it with full-throttle query throughput, all while reopening for each
request, and it didn't
fall over).  If you want to try it out, drop me a line and I can help if you
need any.


On Tue, Jan 26, 2010 at 10:11 PM, Jamie <> wrote:

> Hi Jason
> I am calling it each time the search takes place. It is no only these
> files, there are more.
> In fact, the number of files increases quite frequently. I am seriously
> worried that we will
> run out of file handles after a period of time.
> I am calling getReader every time a search takes place. The writer stays
> open all the time.
> I am reluctant to think its a reader issue, as this happens even if I do
> not execute any searches.
> We are using Lucene 2.9.1.
> Are these files not left over from a merge process? Is lucene closing its
> file handles before
> deleting the files? Any further ideas?
> Jamie
> On 2010/01/27 02:32 AM, Jason Rutherglen wrote:
>> Jamie,
>> How often are you calling getReader?  Is it only these files?
>> Jason
>> On Tue, Jan 26, 2010 at 12:58 PM, Jamie<>  wrote:
>>> Ok. I spoke too soon. The problem is not solved. I am still seeing these
>>> file handles lying around. Is this something I should be worried about?
>>> We are now closing the IndexReader but the IndexWriter remains open
>>> through
>>> out the running of the program.
>>> problem is not solved
>>> s# lsof | grep index |  awk '{n++}; END {print n+0}'
>>> 730
>>>  java      17558   root  898r      REG                8,1   1690991
>>> 246658 /var/index/vol201001/_5q1.cfs
>>> java      17558   root  899r      REG                8,1     76354
>>> 246657 /var/index/vol201001/_5q1.nrm (deleted)
>>> java      17558   root  900r      REG                8,1      4886
>>> 246661 /var/index/vol201001/_5q2.cfs (deleted)
>>> java      17558   root  901r      REG                8,1     19859
>>> 246660 /var/index/vol201001/_5q3.cfs (deleted)
>>> java      17558   root  902r      REG                8,1      3213
>>> 246662 /var/index/vol201001/_5q4.cfs (deleted)
>>> java      17558   root  903r      REG                8,1      1294
>>> 246663 /var/index/vol201001/_5q5.cfs (deleted)
>>> On 2010/01/26 10:09 PM, Jamie wrote:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message