lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: TermVectorsReader performance
Date Thu, 12 Aug 2004 07:14:55 GMT
Grant,

my idea was to put the TermVectorsReader member from SegmentReader in a 
ThreadLocal object and remove synchronization from TermVectorsReader get 
methods. But for some reason, storing a TermVectorsReader object in a 
ThreadLocal doesn't work. The ThreadLocal get method always returns 
null, which i do not understand for the moment :-(

thx
bernhard

Grant Ingersoll wrote:

>I am in the process (almost done, actually) of adding (optional) support
>for both Term position and offset information to Term Vector.  I haven't
>looked closely at the synch code, but am familiar w/ the code on the
>whole and would be happy to help where I can.
>  
>
what does it mean? Did you already change or redesign the current term 
vector implementation? If so, it would be great if you could provide 
your code to the list so that we could have a look on it.

>  
>
>>>>Bernhard.Messer@intrafind.de 8/9/2004 6:03:19 PM >>>
>>>>        
>>>>
>hi all,
>
>i just made a test case to measure the TermVectorsReader performance 
>when running one IndexReader in several threads. To do this, i'm adding
>
>1000 documents with one field and different term for each in a 
>RAMDirectory. Then starting up 1 to 10 threads with the same instance
>of 
>IndexReader and calling getTermVectors(docId) 100 times within each
>thread.
>
>As the cpu time profiler shows, most of the time are spent (81%), 
>waiting for methods to finish. Looking at the TermVectorsReader 
>implementation, nearly all methods are synchronized.
>
>Before i start to break my head and dig into that nightmare of 
>synchronization: Is there somebody out there who started with some 
>cleanups on TermVectorsReader, or just thought about it ?
>
>My first idea was to make the termVector Object ThreadLocal in 
>SegmentReader, but i think this wouldn't work because IndexReaders are
>
>using there own thread (which is quite clever and should not be a 
>candidate for a change).
>
>CPU TIME (ms) BEGIN (total = 100) Mon Aug  9 23:43:03 2004
>rank   self  accum   count trace method
>   1 32.00% 32.00%    1553    58 java.lang.Thread.sleep
>   2 25.00% 57.00%      75    60 java.lang.Object.wait
>   3 24.00% 81.00%      77    59 java.lang.Object.wait
>   4  4.00% 85.00% 1229000    56 
>org.apache.lucene.store.InputStream.readVInt
>   5  4.00% 89.00%  100000    51 
>org.apache.lucene.index.TermVectorsReader.readTermVector
>   6  3.00% 92.00% 1745300    54 
>org.apache.lucene.store.InputStream.readByte
>   7  2.00% 94.00%  343000    49 
>org.apache.lucene.store.InputStream.readChars
>   8  2.00% 96.00% 1229000    53 
>org.apache.lucene.store.InputStream.readByte
>   9  1.00% 97.00%  200000    55
>org.apache.lucene.store.InputStream.readInt
>  10  1.00% 98.00%  800000    52 
>org.apache.lucene.store.InputStream.readByte
>  11  1.00% 99.00%  343000    57 java.lang.String.<init>
>  12  1.00% 100.00%  100000    50 
>org.apache.lucene.index.TermVectorsReader.get
>CPU TIME (ms) END
>
>thx
>Bernhard
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>  
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message