lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aviran" <amo...@infosciences.com>
Subject RE: Lucene Search has poor cpu utilization on a 4-CPU machine
Date Mon, 12 Jul 2004 17:47:36 GMT
I use Lucene 1.4 final

Here is the thread dump for one blocked thread (If you want a full thread
dump for all threads I can do that too)

"Thread-32" daemon prio=1 tid=0x082334c0 nid=0xa66 waiting for monitor entry
[4f385000..4f38687c]
        at java.util.Vector.elementAt(Vector.java:430)
        - waiting to lock <0x452b93a8> (a java.util.Vector)
        at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
        at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:151)
        at
org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:149)
        at
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115)
        at
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143)
        at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:137)
        at
org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
        at
org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364)
        at
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59)
        at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java
:165)
        at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java
:165)
        at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:154)
        at
gov.gsa.search.SearcherByPageAndSortedField.search(SearcherByPageAndSortedFi
eld.java:317)
        at
gov.gsa.search.SearcherByPageAndSortedField.search(SearcherByPageAndSortedFi
eld.java:203)
        at
gov.gsa.search.grants.SearchGrants.searchByPageAndSortedField(SearchGrants.j
ava:308)
        at
gov.gsa.search.grants.SearchServlet.searchByIndex(SearchServlet.java:1541)
        at
gov.gsa.search.grants.SearchServlet.getResults(SearchServlet.java:1325)
        at gov.gsa.search.grants.SearchServlet.doGet(SearchServlet.java:500)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:247)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:193)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:256)
        at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
        at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
        at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:191)
        at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
        at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
        at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
        at
org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2415)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180
)
        at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
        at
org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.
java:171)
        at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172
)
        at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
        at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
        at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:174)
        at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
        at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
        at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
        at
org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:223)
        at
org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:261)
        at
org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:360)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:604)
        at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:562)
        at
org.apache.jk.common.SocketConnection.runIt(ChannelSocket.java:679)
        at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.jav
a:619)
        at java.lang.Thread.run(Thread.java:534)


And how do I submit a patch to the developer mailing list? Just attach a
changed java file to an email?

Aviran

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Monday, July 12, 2004 12:42 PM
To: Lucene Users List
Subject: Re: Lucene Search has poor cpu utilization on a 4-CPU machine


Aviran wrote:
> First let me explain what I found out. I'm running Lucene on a 4 CPU 
> server. While doing some stress tests I've noticed (by doing full 
> thread dump) that searching threads are blocked on the method: public 
> FieldInfo fieldInfo(int
> fieldNumber) This causes for a significant cpu idle time. 

What version of Lucene are you running?  Also, can you please send the 
stack traces of the blocked threads, or at least a description of them? 
  I'd be interested to see what context this happens in.  In particular, 
which IndexReader and Searcher/Scorer/Weight methods does it happen under?

> I noticed that the class org.apache.lucene.index.FieldInfos uses 
> private class members Vector byNumber and Hashtable byName, both of 
> which are synchronized objects. By changing the Vector byNumber to 
> ArrayList byNumber I was able to get 110% improvement in performance 
> (number of searches per second).

That's impressive!  Good job finding a bottleneck!

> My question is: do the fields byNumber and byName have to be 
> synchronized and what can happen if I'll change them to be ArrayList 
> and HashMap which are not synchronized ? Can this corrupt the index or 
> the integrity of the results?

I think that is a safe change.  FieldInfos is only modifed by 
DocumentWriter and SegmentMerger, and there is no possibility of other 
threads accessing those instances.  Please submit a patch to the 
developer mailing list.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message