lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@lbl.gov>
Subject [Performance] Streaming main memory indexing of single strings
Date Wed, 13 Apr 2005 20:11:12 GMT
Hi,

I'm wondering if anyone could let me know how to improve Lucene 
performance for "streaming main memory indexing of single strings". 
This would help to effectively integrate Lucene with the Nux XQuery 
engine.

Below is a small microbenchmark simulating STREAMING XQuery fulltext 
search as typical for XML network routers, message queuing system, P2P 
networks, etc. In this on-the-fly main memory indexing scenario, each 
individual string is immediately matched as soon as it becomes 
available without any persistance involved. This usage scenario and 
corresponding performance profile is quite different in comparison to 
fulltext search over persistent (read-mostly) indexes.

The benchmark runs at some 3000 lucene queries/sec (lucene-1.4.3) which 
is unfortunate news considering the XQuery engine can easily walk 
hundreds of thousands of XML nodes per second. Ideally I'd like to run 
at some 100000 queries/sec. Runnning this through the JDK 1.5 profiler 
it seems that most time is spent in and below the following calls:

writer = new IndexWriter(dir, analyzer, true);
writer.addDocument(...);
writer.close();

I tried quite a few variants of the benchmark with various options, 
unfortunately with little or no effect.
Lucene just does not seem to designed to do this sort of "transient 
single string index" thing. All code paths related to opening, closing, 
reading, writing, querying and object creation seem to be designed for 
large persistent indexes.

Any advice on what I'm missing or what could be done about it would be 
greatly appreciated.

Wolfgang.

P.S. the benchmark code is attached as a file below:


Mime
View raw message