lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Kristensson <mark.kristens...@smartsheet.com>
Subject Re: IndexWriter.close() performance issue
Date Wed, 17 Nov 2010 21:38:59 GMT
After a week away, I'm back and still working to get to the bottom of this issue. We run Lucene
from the binaries, so making changes to the source code is not something we are really setup
to do right now.

I have, however, created a trivial Java app that just opens an IndexReader for our problematic
index and then closes it:

	try {
		IndexReader indexReader = IndexReader.open(getIndexDirectory(indexPath));
		System.out.println("Successfully opened index at " + indexPath);
		indexReader.close();
		System.out.println("Successfully closed index at " + indexPath);
	} catch (Exception ex) {
		System.out.println("Exception while opening index: " + ex.getMessage());
	}

I've run this simple app with the hprof commands suggested below and it appears that a huge
amount of the CPU work is spent on String function(s). Below is the summary from the end of
the java.hprof.txt. I'm happy to attach the whole file, but I wasn't sure whether that was
appropriate for this mailing list.

Thanks,
Mark



CPU SAMPLES BEGIN (total = 5295) Wed Nov 17 11:54:15 2010
rank   self  accum   count trace method
   1 80.40% 80.40%    4257 300165 java.lang.String.intern
   2  1.83% 82.23%      97 300189 sun.nio.ch.FileDispatcher.pread0
   3  0.83% 83.06%      44 300232 java.util.HashMap.transfer
   4  0.72% 83.78%      38 300201 sun.nio.ch.FileDispatcher.pread0
   5  0.70% 84.48%      37 300252 org.apache.lucene.util.SimpleStringInterner.intern
   6  0.60% 85.08%      32 300191 java.lang.StringCoding$StringDecoder.decode
   7  0.59% 85.67%      31 300202 java.lang.System.arraycopy
   8  0.38% 86.04%      20 300098 java.util.zip.ZipFile.read
   9  0.36% 86.40%      19 300203 java.util.Arrays.copyOfRange
  10  0.36% 86.76%      19 300224 sun.nio.ch.FileDispatcher.pread0
  11  0.32% 87.08%      17 300089 java.lang.Class.forName0
  12  0.32% 87.40%      17 300237 java.lang.Thread.currentThread
  13  0.28% 87.69%      15 300049 java.lang.ClassLoader.findBootstrapClass
  14  0.28% 87.97%      15 300102 java.util.zip.ZipFile.read
  15  0.26% 88.23%      14 300180 java.util.zip.ZipFile.read
  16  0.26% 88.50%      14 300255 java.lang.Thread.currentThread
  17  0.26% 88.76%      14 300335 sun.nio.ch.FileDispatcher.pread0
  18  0.25% 89.01%      13 300164 java.lang.System.arraycopy
  19  0.25% 89.25%      13 300286 sun.nio.ch.NativeThread.current
  20  0.23% 89.48%      12 300240 sun.nio.ch.FileDispatcher.pread0
  21  0.23% 89.71%      12 300242 java.lang.System.arraycopy
  22  0.21% 89.92%      11 300207 java.lang.Thread.currentThread
  23  0.21% 90.12%      11 300231 java.lang.System.getSecurityManager
  24  0.19% 90.31%      10 300155 java.util.zip.ZipFile.read
  25  0.19% 90.50%      10 300216 java.lang.ClassLoader.findBootstrapClass
  26  0.19% 90.69%      10 300239 java.nio.Bits.copyToByteArray
  27  0.19% 90.88%      10 300350 java.util.HashMap.values
  28  0.17% 91.05%       9 300034 sun.net.www.protocol.file.Handler.createFileURLConnection
  29  0.17% 91.22%       9 300283 sun.nio.ch.FileDispatcher.pread0
  30  0.15% 91.37%       8 300006 java.util.jar.JarFile.getBytes
  31  0.15% 91.52%       8 300008 java.util.zip.ZipFile.getInputStream
  32  0.15% 91.67%       8 300166 java.util.zip.ZipFile.read
  33  0.15% 91.82%       8 300179 java.lang.ClassLoader.findBootstrapClass
  34  0.15% 91.97%       8 300209 sun.nio.ch.FileDispatcher.pread0
  35  0.13% 92.11%       7 300123 java.lang.ClassLoader$NativeLibrary.load
  36  0.13% 92.24%       7 300140 sun.nio.ch.FileDispatcher.pread0
  37  0.13% 92.37%       7 300225 sun.nio.ch.FileDispatcher.pread0
  38  0.13% 92.50%       7 300246 java.nio.Bits.copyToByteArray
  39  0.11% 92.62%       6 300031 java.util.zip.ZipFile.read
  40  0.11% 92.73%       6 300059 java.io.FileInputStream.readBytes
  41  0.11% 92.84%       6 300101 java.lang.ClassLoader.findBootstrapClass
  42  0.11% 92.96%       6 300138 java.lang.ClassLoader.findBootstrapClass
  43  0.11% 93.07%       6 300241 sun.nio.ch.FileDispatcher.pread0
  44  0.11% 93.18%       6 300282 java.lang.Thread.currentThread
  45  0.11% 93.30%       6 300290 org.apache.lucene.index.TermInfosReader.<init>
  46  0.11% 93.41%       6 300311 org.apache.lucene.util.UnicodeUtil.UTF8toUTF16
  47  0.09% 93.50%       5 300047 java.util.zip.ZipFile.read
  48  0.09% 93.60%       5 300057 java.io.UnixFileSystem.getBooleanAttributes0
  49  0.09% 93.69%       5 300064 sun.security.jca.Providers.<clinit>
  50  0.09% 93.79%       5 300254 sun.nio.ch.NativeThread.current
  51  0.09% 93.88%       5 300324 org.apache.lucene.index.SegmentTermEnum.next
  52  0.09% 93.98%       5 300340 java.util.HashMap.put
  53  0.08% 94.05%       4 300007 java.util.zip.ZipFile.getInputStream
  54  0.08% 94.13%       4 300009 java.util.zip.ZipFile.getInflater
  55  0.08% 94.20%       4 300010 java.util.jar.JarFile.getManifestFromReference
  56  0.08% 94.28%       4 300051 java.lang.ClassLoader.findBootstrapClass
  57  0.08% 94.35%       4 300054 java.lang.ClassLoader.findBootstrapClass
  58  0.08% 94.43%       4 300083 java.util.HashMap.entrySet0
  59  0.08% 94.50%       4 300108 java.util.zip.ZipFile.read
  60  0.08% 94.58%       4 300135 java.util.zip.ZipFile.read
  61  0.08% 94.66%       4 300142 java.util.zip.ZipFile.read
  62  0.08% 94.73%       4 300238 java.lang.Thread.currentThread
  63  0.08% 94.81%       4 300247 sun.nio.ch.FileDispatcher.pread0
  64  0.08% 94.88%       4 300253 java.lang.Thread.currentThread
  65  0.08% 94.96%       4 300257 java.util.HashMap.resize
  66  0.08% 95.03%       4 300275 sun.nio.ch.FileDispatcher.pread0
  67  0.08% 95.11%       4 300295 org.apache.lucene.index.TermBuffer.read
  68  0.08% 95.18%       4 300299 org.apache.lucene.index.SegmentTermEnum.next
  69  0.06% 95.24%       3 300004 java.util.zip.ZipFile.getEntry
  70  0.06% 95.30%       3 300021 sun.misc.URLClassPath$3.run
  71  0.06% 95.35%       3 300050 java.util.zip.ZipFile.read
  72  0.06% 95.41%       3 300055 java.security.MessageDigest.getInstance
  73  0.06% 95.47%       3 300124 java.lang.ClassLoader$NativeLibrary.load
  74  0.06% 95.52%       3 300249 java.util.HashMap.getEntry
  75  0.06% 95.58%       3 300250 java.util.HashMap.getEntry
  76  0.06% 95.64%       3 300261 java.lang.System.arraycopy
  77  0.06% 95.69%       3 300267 java.util.Arrays.copyOf
  78  0.06% 95.75%       3 300276 org.apache.lucene.index.TermInfosReader.<init>
  79  0.06% 95.81%       3 300277 org.apache.lucene.index.SegmentTermEnum.next
  80  0.06% 95.86%       3 300300 org.apache.lucene.index.TermInfosReader.<init>
  81  0.06% 95.92%       3 300304 org.apache.lucene.store.IndexInput.readVLong
  82  0.06% 95.98%       3 300318 sun.nio.ch.NativeThread.current
  83  0.06% 96.03%       3 300338 sun.nio.cs.UTF_8.updatePositions
  84  0.06% 96.09%       3 300339 org.apache.lucene.util.SimpleStringInterner.intern
  85  0.04% 96.13%       2 300001 java.lang.ClassLoader.findBootstrapClass
  86  0.04% 96.17%       2 300079 java.lang.Math.floor
  87  0.04% 96.20%       2 300085 java.security.Provider.parseLegacyPut
  88  0.04% 96.24%       2 300107 org.apache.lucene.index.IndexReader.open
  89  0.04% 96.28%       2 300119 java.io.RandomAccessFile.getChannel
  90  0.04% 96.32%       2 300190 java.lang.System.arraycopy
  91  0.04% 96.36%       2 300197 java.nio.ByteBuffer.hasArray
  92  0.04% 96.39%       2 300198 java.util.HashMap.put
  93  0.04% 96.43%       2 300199 java.util.HashMap.hash
  94  0.04% 96.47%       2 300210 java.util.HashMap.addEntry
  95  0.04% 96.51%       2 300217 java.util.HashMap.hash
  96  0.04% 96.54%       2 300220 java.nio.Buffer.position
  97  0.04% 96.58%       2 300222 org.apache.lucene.index.FieldInfos.read
  98  0.04% 96.62%       2 300235 java.util.Arrays.copyOf
  99  0.04% 96.66%       2 300243 java.lang.System.arraycopy
 100  0.04% 96.69%       2 300248 org.apache.lucene.index.FieldInfos.hasVectors
 101  0.04% 96.73%       2 300258 java.lang.Thread.currentThread
 102  0.04% 96.77%       2 300262 org.apache.lucene.util.StringHelper.intern
 103  0.04% 96.81%       2 300264 java.lang.Thread.isInterrupted
 104  0.04% 96.85%       2 300292 org.apache.lucene.index.TermInfosReader.<init>
 105  0.04% 96.88%       2 300297 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal
 106  0.04% 96.92%       2 300309 sun.nio.ch.FileDispatcher.pread0
 107  0.04% 96.96%       2 300319 org.apache.lucene.store.IndexInput.readVLong
 108  0.04% 97.00%       2 300321 org.apache.lucene.store.IndexInput.readVLong
 109  0.04% 97.03%       2 300323 org.apache.lucene.store.BufferedIndexInput.refill
 110  0.04% 97.07%       2 300326 sun.nio.ch.NativeThread.current
 111  0.04% 97.11%       2 300332 org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores
 112  0.04% 97.15%       2 300334 org.apache.lucene.index.SegmentReader.get
 113  0.04% 97.19%       2 300341 java.util.HashMap.put
 114  0.04% 97.22%       2 300344 org.apache.lucene.util.SimpleStringInterner.intern
 115  0.04% 97.26%       2 300346 org.apache.lucene.store.IndexInput.readString
 116  0.02% 97.28%       1 300003 java.util.zip.ZipFile.open
 117  0.02% 97.30%       1 300014 java.util.jar.Attributes.putValue
 118  0.02% 97.32%       1 300019 sun.misc.URLClassPath.getLoader
 119  0.02% 97.34%       1 300023 sun.misc.URLClassPath$JarLoader.ensureOpen
 120  0.02% 97.36%       1 300027 sun.misc.URLClassPath$JarLoader.checkResource
 121  0.02% 97.37%       1 300029 sun.security.util.ManifestEntryVerifier.<init>
 122  0.02% 97.39%       1 300036 sun.net.www.URLConnection.<init>
 123  0.02% 97.41%       1 300039 java.io.FilePermission$1.run
 124  0.02% 97.43%       1 300060 java.util.Properties$LineReader.readLine
 125  0.02% 97.45%       1 300066 sun.security.jca.ProviderList.<clinit>
 126  0.02% 97.47%       1 300071 java.security.Provider.<init>
 127  0.02% 97.49%       1 300075 sun.security.jca.ProviderConfig.getLock
 128  0.02% 97.51%       1 300081 sun.security.provider.NativePRNG.initIO
 129  0.02% 97.53%       1 300086 java.lang.Character.toUpperCaseEx
 130  0.02% 97.54%       1 300088 java.util.HashMap.put
 131  0.02% 97.56%       1 300093 org.apache.lucene.store.FSDirectory.<clinit>
 132  0.02% 97.58%       1 300095 java.lang.ClassLoader.defineClass1
 133  0.02% 97.60%       1 300096 java.util.zip.Inflater.inflateBytes
 134  0.02% 97.62%       1 300097 sun.security.provider.MD5.implDigest
 135  0.02% 97.64%       1 300099 java.util.zip.Inflater.inflateBytes
 136  0.02% 97.66%       1 300103 java.lang.ClassLoader.defineClass1
 137  0.02% 97.68%       1 300104 java.util.Arrays.copyOf
 138  0.02% 97.70%       1 300105 java.lang.String.indexOf
 139  0.02% 97.71%       1 300106 java.util.zip.InflaterInputStream.<init>
 140  0.02% 97.73%       1 300109 java.util.zip.ZipFile.read
 141  0.02% 97.75%       1 300111 java.util.Arrays.copyOfRange
 142  0.02% 97.77%       1 300113 java.util.zip.Inflater.inflateBytes
 143  0.02% 97.79%       1 300115 java.util.Arrays.copyOf
 144  0.02% 97.81%       1 300116 java.lang.ClassLoader.findBootstrapClass
 145  0.02% 97.83%       1 300117 java.lang.String.lastIndexOf
 146  0.02% 97.85%       1 300122 sun.security.action.LoadLibraryAction.<init>
 147  0.02% 97.87%       1 300127 sun.nio.ch.FileChannelImpl.<init>
 148  0.02% 97.88%       1 300132 java.nio.DirectByteBuffer.<init>
 149  0.02% 97.90%       1 300136 java.util.Arrays.copyOfRange
 150  0.02% 97.92%       1 300141 java.lang.ref.SoftReference.get
 151  0.02% 97.94%       1 300143 java.util.zip.Inflater.inflateBytes
 152  0.02% 97.96%       1 300145 org.apache.lucene.index.SegmentInfos.read
 153  0.02% 97.98%       1 300146 java.util.zip.CRC32.update
 154  0.02% 98.00%       1 300147 java.nio.CharBuffer.hasArray
 155  0.02% 98.02%       1 300148 java.lang.ClassLoader.defineClass1
 156  0.02% 98.04%       1 300149 org.apache.lucene.index.DirectoryReader.<init>
 157  0.02% 98.05%       1 300150 java.lang.ClassLoader.defineClass1
 158  0.02% 98.07%       1 300151 java.lang.AbstractStringBuilder.<init>
 159  0.02% 98.09%       1 300154 java.lang.ClassLoader.defineClass1
 160  0.02% 98.11%       1 300157 org.apache.lucene.index.SegmentReader$CoreReaders.<init>
 161  0.02% 98.13%       1 300159 org.apache.lucene.util.StringHelper.<clinit>
 162  0.02% 98.15%       1 300161 org.apache.lucene.util.SimpleStringInterner.<init>
 163  0.02% 98.17%       1 300167 java.lang.ClassLoader.defineClass1
 164  0.02% 98.19%       1 300168 org.apache.lucene.index.SegmentReader$CoreReaders.<init>
 165  0.02% 98.21%       1 300170 org.apache.lucene.index.SegmentTermEnum.<init>
 166  0.02% 98.22%       1 300174 java.util.zip.Inflater.inflateBytes
 167  0.02% 98.24%       1 300177 java.security.AccessController.doPrivileged
 168  0.02% 98.26%       1 300181 java.util.zip.Inflater.inflateBytes
 169  0.02% 98.28%       1 300182 java.util.Arrays.copyOf
 170  0.02% 98.30%       1 300183 java.util.Arrays.copyOf
 171  0.02% 98.32%       1 300184 java.lang.ClassLoader.defineClass1
 172  0.02% 98.34%       1 300185 java.lang.ref.SoftReference.get
 173  0.02% 98.36%       1 300186 java.lang.String.replace
 174  0.02% 98.38%       1 300187 org.apache.lucene.index.SegmentReader.openNorms
 175  0.02% 98.39%       1 300188 java.nio.charset.CharsetDecoder.flush
 176  0.02% 98.41%       1 300192 sun.nio.cs.UTF_8$Decoder.decodeLoop
 177  0.02% 98.43%       1 300193 java.lang.StringCoding.decode
 178  0.02% 98.45%       1 300194 java.lang.System.arraycopy
 179  0.02% 98.47%       1 300195 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
 180  0.02% 98.49%       1 300196 java.util.HashMap.hash
 181  0.02% 98.51%       1 300200 sun.nio.cs.UTF_8$Decoder.isMalformed2
 182  0.02% 98.53%       1 300204 java.lang.System.arraycopy
 183  0.02% 98.55%       1 300205 java.io.RandomAccessFile.open
 184  0.02% 98.56%       1 300206 org.apache.lucene.util.SimpleStringInterner.intern
 185  0.02% 98.58%       1 300208 java.nio.charset.CoderResult.isUnderflow
 186  0.02% 98.60%       1 300211 org.apache.lucene.util.SimpleStringInterner.intern
 187  0.02% 98.62%       1 300212 java.util.HashMap.addEntry
 188  0.02% 98.64%       1 300213 org.apache.lucene.store.IndexInput.readVInt
 189  0.02% 98.66%       1 300214 java.util.ArrayList.RangeCheck
 190  0.02% 98.68%       1 300218 org.apache.lucene.index.FieldInfos.read
 191  0.02% 98.70%       1 300219 java.lang.Thread.currentThread
 192  0.02% 98.72%       1 300221 java.util.HashMap.transfer
 193  0.02% 98.73%       1 300223 java.lang.System.arraycopy
 194  0.02% 98.75%       1 300226 org.apache.lucene.store.IndexInput.readString
 195  0.02% 98.77%       1 300227 org.apache.lucene.store.BufferedIndexInput.readBytes
 196  0.02% 98.79%       1 300228 java.util.ArrayList.size
 197  0.02% 98.81%       1 300229 java.lang.StringCoding.access$100
 198  0.02% 98.83%       1 300230 java.lang.Thread.currentThread
 199  0.02% 98.85%       1 300233 java.lang.Throwable.fillInStackTrace
 200  0.02% 98.87%       1 300236 java.lang.StringCoding.decode
 201  0.02% 98.89%       1 300244 sun.nio.ch.FileDispatcher.pread0
 202  0.02% 98.90%       1 300245 sun.nio.ch.NativeThread.current
 203  0.02% 98.92%       1 300251 java.util.HashMap.getEntry
 204  0.02% 98.94%       1 300259 java.nio.Bits.copyToByteArray
 205  0.02% 98.96%       1 300260 java.nio.DirectByteBuffer.get
 206  0.02% 98.98%       1 300263 java.lang.Thread.currentThread
 207  0.02% 99.00%       1 300265 sun.nio.ch.FileChannelImpl.read
 208  0.02% 99.02%       1 300266 java.nio.channels.spi.AbstractInterruptibleChannel.begin
 209  0.02% 99.04%       1 300268 sun.nio.ch.NativeThread.current
 210  0.02% 99.06%       1 300269 sun.nio.ch.FileChannelImpl.read
 211  0.02% 99.07%       1 300270 java.nio.Bits.copyToByteArray
 212  0.02% 99.09%       1 300271 java.lang.Thread.currentThread
 213  0.02% 99.11%       1 300272 java.lang.Object.clone
 214  0.02% 99.13%       1 300273 org.apache.lucene.index.TermInfosReader.<init>
 215  0.02% 99.15%       1 300274 org.apache.lucene.index.TermInfosReader.<init>
 216  0.02% 99.17%       1 300278 org.apache.lucene.index.SegmentTermEnum.next
 217  0.02% 99.19%       1 300279 java.nio.channels.spi.AbstractInterruptibleChannel.isOpen
 218  0.02% 99.21%       1 300280 java.lang.Thread.isInterrupted
 219  0.02% 99.23%       1 300281 java.lang.Thread.currentThread
 220  0.02% 99.24%       1 300284 java.lang.System.arraycopy
 221  0.02% 99.26%       1 300285 java.lang.System.arraycopy
 222  0.02% 99.28%       1 300287 java.lang.Thread.currentThread
 223  0.02% 99.30%       1 300288 java.nio.Bits.copyToByteArray
 224  0.02% 99.32%       1 300289 org.apache.lucene.store.BufferedIndexInput.readBytes
 225  0.02% 99.34%       1 300291 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal
 226  0.02% 99.36%       1 300293 java.lang.Thread.isInterrupted
 227  0.02% 99.38%       1 300294 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal
 228  0.02% 99.40%       1 300296 org.apache.lucene.store.BufferedIndexInput.readByte
 229  0.02% 99.41%       1 300298 sun.nio.ch.FileChannelImpl.read
 230  0.02% 99.43%       1 300301 java.lang.Thread.currentThread
 231  0.02% 99.45%       1 300302 sun.nio.ch.FileChannelImpl.ensureOpen
 232  0.02% 99.47%       1 300303 sun.nio.ch.FileDispatcher.pread0
 233  0.02% 99.49%       1 300305 sun.nio.ch.FileDispatcher.pread0
 234  0.02% 99.51%       1 300306 sun.nio.ch.NativeThread.current
 235  0.02% 99.53%       1 300307 org.apache.lucene.index.TermBuffer.toTerm
 236  0.02% 99.55%       1 300308 org.apache.lucene.store.BufferedIndexInput.refill
 237  0.02% 99.57%       1 300310 sun.nio.ch.FileChannelImpl.read
 238  0.02% 99.58%       1 300312 sun.nio.ch.NativeThread.current
 239  0.02% 99.60%       1 300313 sun.nio.ch.FileChannelImpl.read
 240  0.02% 99.62%       1 300314 sun.nio.ch.NativeThread.current
 241  0.02% 99.64%       1 300315 org.apache.lucene.store.BufferedIndexInput.refill
 242  0.02% 99.66%       1 300316 org.apache.lucene.store.BufferedIndexInput.refill
 243  0.02% 99.68%       1 300317 sun.nio.ch.FileChannelImpl.read
 244  0.02% 99.70%       1 300320 org.apache.lucene.index.TermBuffer.read
 245  0.02% 99.72%       1 300322 org.apache.lucene.store.BufferedIndexInput.refill
 246  0.02% 99.74%       1 300325 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal
 247  0.02% 99.75%       1 300327 org.apache.lucene.store.IndexInput.readVInt
 248  0.02% 99.77%       1 300328 org.apache.lucene.store.BufferedIndexInput.refill
 249  0.02% 99.79%       1 300329 java.nio.Bits.copyToByteArray
 250  0.02% 99.81%       1 300330 org.apache.lucene.store.BufferedIndexInput.refill
 251  0.02% 99.83%       1 300331 sun.nio.ch.NativeThread.current
 252  0.02% 99.85%       1 300333 sun.misc.Unsafe.setMemory
 253  0.02% 99.87%       1 300336 org.apache.lucene.index.TermInfosReader.<init>
 254  0.02% 99.89%       1 300337 java.lang.Thread.currentThread
 255  0.02% 99.91%       1 300342 org.apache.lucene.index.FieldInfos.read
 256  0.02% 99.92%       1 300343 org.apache.lucene.store.IndexInput.readString
 257  0.02% 99.94%       1 300345 sun.nio.ch.IOUtil.read
 258  0.02% 99.96%       1 300347 java.nio.channels.spi.AbstractInterruptibleChannel.begin
 259  0.02% 99.98%       1 300348 org.apache.lucene.index.FieldInfos.addInternal
 260  0.02% 100.00%       1 300349 java.nio.channels.spi.AbstractInterruptibleChannel.begin
CPU SAMPLES END



On Nov 5, 2010, at 10:53 AM, Michael McCandless wrote:

> Hmm...
> 
> So, I was going on this output from your CheckIndex:
> 
>   test: field norms.........OK [296713 fields]
> 
> But in fact I just looked and that number is bogus -- it's always
> equal to total number of fields, not number of fields with norms
> enabled.  I'll open an issue to fix this, but in the meantime can you
> apply this patch to your CheckIndex and run it again?
> 
> Index: src/java/org/apache/lucene/index/CheckIndex.java
> ===================================================================
> --- src/java/org/apache/lucene/index/CheckIndex.java	(revision 1031678)
> +++ src/java/org/apache/lucene/index/CheckIndex.java	(working copy)
> @@ -570,8 +570,10 @@
>       }
>       final byte[] b = new byte[reader.maxDoc()];
>       for (final String fieldName : fieldNames) {
> -        reader.norms(fieldName, b, 0);
> -        ++status.totFields;
> +        if (reader.hasNorms(fieldName)) {
> +          reader.norms(fieldName, b, 0);
> +          ++status.totFields;
> +        }
>       }
> 
>       msg("OK [" + status.totFields + " fields]");
> 
> So if in fact you have already disabled norms then something else is
> the source of the sudden slowness.  Though, such a huge number of
> unique field names is not an area of Lucene that's very well tested...
> perhaps there's something silly somewhere.  Maybe you can try
> profiling just the init of your IndexReader?  (Eg, run java with
> -agentlib:hprof=cpu=samples,depth=16,interval=1).
> 
> Yes, both Index.NOT_ANALYZED_NO_NORMS and Index.NO will disable norms
> as long as no document in the index ever had norms on (yes it does
> "infect" heh).
> 
> Mike
> 
> On Fri, Nov 5, 2010 at 1:37 PM, Mark Kristensson
> <mark.kristensson@smartsheet.com> wrote:
>> While most of our Lucene indexes are used for more traditional searching, this index
in particular is used more like a reporting repository. Thus, we really do need to have that
many fields indexed and they do need to be broken out into separate fields. There may be another
way to structure the index to reduce the number of fields, but I'm hoping we can optimize
the current design and avoid (yet another) index redesign.
>> 
>> I'll look into the tweaking the merge policy, but I'm more interested in disabling
norms because scoring really doesn't matter for us. Basically, we need nothing more than a
binary answer from Lucene: either a record meets the provided criteria (which can be a rather
complex boolean query with many subqueries) or it doesn't. If the record does match, then
we get the IDs from lucene and run off to get the live data from our primary data store and
sort it (in Java) based upon criteria provided by the user, not by score.
>> 
>> After our initial design mushroomed in size, we redesigned and now (I thought) do
not have norms on any of the fields in this index. So, I'm wondering if there was something
in the results from the CheckIndex that I provided which indicates to you that we may have
norms still enabled? I know that if you have norms on any one document's field, then any other
document with that same field will get "infected" with norms as well.
>> 
>> My understanding is that any field that uses the constants  Index.NOT_ANALYZED_NO_NORMS
or  Index.NO will not  have norms on it, regardless of whether or not the field is stored.
Is that not correct?
>> 
>> Thanks,
>> Mark
>> 
>> 
>> 
>> On Nov 4, 2010, at 2:56 AM, Michael McCandless wrote:
>> 
>>> Likely what happened is you had a bunch of smaller segments, and then
>>> suddenly they got merged into that one big segment (_aiaz) in your
>>> index.
>>> 
>>> The representation for norms in particular is not sparse, so this
>>> means the size of the norms file for a given segment will be
>>> number-of-unique-indexed-fields X number-of-documents.
>>> 
>>> So this count grows quadratically on merge.
>>> 
>>> Do these fields really need to be indexed?   If so, it'd be better to
>>> use a single field for all users for the indexable text if you can.
>>> 
>>> Failing that, a simple workaround is to set the maxMergeMB/Docs on the
>>> merge policy; this'd prevent big segments from being produced.
>>> Disabling norms should also workaround this, though that will affect
>>> hit scores...
>>> 
>>> Mike
>>> 
>>> On Wed, Nov 3, 2010 at 7:37 PM, Mark Kristensson
>>> <mark.kristensson@smartsheet.com> wrote:
>>>> Yes, we do have a large number of unique field names in that index, because
they are driven by user named fields in our application (with some cleaning to remove illegal
chars).
>>>> 
>>>> This slowness problem has appeared very suddenly in the last couple of weeks
and the number of unique field names has not spiked in the last few weeks. Have we crept over
some threshold with our linear growth in the number of unique field names? Perhaps there is
a limit driven by the amount of RAM in the machine that we are violating? Are there any guidelines
for the maximum number, or suggested number, of unique fields names in an index or segment?
Any suggestions for potentially mitigating the problem?
>>>> 
>>>> Thanks,
>>>> Mark
>>>> 
>>>> 
>>>> On Nov 3, 2010, at 2:02 PM, Michael McCandless wrote:
>>>> 
>>>>> On Wed, Nov 3, 2010 at 4:27 PM, Mark Kristensson
>>>>> <mark.kristensson@smartsheet.com> wrote:
>>>>>> 
>>>>>> I've run checkIndex against the index and the results are below.
That net is that it's telling me nothing is wrong with the index.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>>> I did not have any instrumentation around the opening of the IndexSearcher
(we don't use an IndexReader), just around the actual query execution so I had to add some
additional logging. What I found surprised me, opening a search against this index takes the
same 6 to 8 seconds that closing the indexWriter takes.
>>>>> 
>>>>> IndexWriter opens a SegmentReader for each segment in the index, to
>>>>> apply deletions, so I think this is the source of the slowness.
>>>>> 
>>>>> From the CheckIndex output, it looks like you have many (296,713)
>>>>> unique fields names on that one large segment -- does that sound
>>>>> right?  I suspect such a very high field count is the source of the
>>>>> slowness...
>>>>> 
>>>>> Mike
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> 
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message