lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clemens Wyss DEV <clemens...@mysign.ch>
Subject AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged
Date Fri, 13 Jun 2014 12:53:02 GMT
Thanks a lot!
>"large text fields"
What is a good limit (in characters) to switch from StringField to TextField? Do <Langugae>Analyzers
(e.g. GermanAnalyzer)  help a lot in reducing the size of an Index?

> Add XXXDocValuesField instead of e.g. StringField.
Does this apply only for StringFields? Or for TextFields too?

> Upgrade to the upcoming Lucene 4.9
we have not yet transitionen to Java 7/8 ... hopefully soon ;)

> and take a heap dump and see what's using RAM
Find attached a snippet from MemoryAnalyzer
Class Name                                                                               
                                                         | Shallow Heap | Retained Heap |
Percentage
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
org.apache.lucene.index.StandardDirectoryReader @ 0x783932460                            
                                                         |           72 |    59'255'872 |
     3.04%
|- org.apache.lucene.index.SegmentReader[24] @ 0x794089ee0                               
                                                         |          112 |    59'190'960 |
     3.03%
|  |- org.apache.lucene.index.SegmentReader @ 0x788820f40                                
                                                         |           72 |    16'905'072 |
     0.87%
|  |  |- org.apache.lucene.index.SegmentCoreReaders @ 0x7910cacc8                        
                                                         |           56 |    16'895'576 |
     0.87%
|  |  |  |- org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader @ 0x780661c50
                                                   |           24 |    16'864'864 |      0.86%
|  |  |  |  |- org.apache.lucene.codecs.BlockTreeTermsReader @ 0x7910cae50               
                                                         |           56 |    16'864'240 |
     0.86%
|  |  |  |  |  |- java.util.TreeMap @ 0x783902738                                        
                                                         |           48 |    16'858'472 |
     0.86%
|  |  |  |  |  |  '- java.util.TreeMap$Entry @ 0x77ec5f9f8                               
                                                         |           40 |    16'858'424 |
     0.86%
|  |  |  |  |  |     |- java.util.TreeMap$Entry @ 0x77ec5fa20                            
                                                         |           40 |    10'895'656 |
     0.56%
|  |  |  |  |  |     |- java.util.TreeMap$Entry @ 0x77ec5fa48                            
                                                         |           40 |     5'960'072 |
     0.31%
|  |  |  |  |  |     |  |- java.util.TreeMap$Entry @ 0x77ec5fa98                         
                                                         |           40 |     5'958'072 |
     0.31%
|  |  |  |  |  |     |  |  |- java.util.TreeMap$Entry @ 0x77fc09bf0                      
                                                         |           40 |     5'949'864 |
     0.30%
|  |  |  |  |  |     |  |  |- org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @
0x788820e20                                              |           72 |         8'168 |
     0.00%
|  |  |  |  |  |     |  |  '- Total: 2 entries                                           
                                                         |              |               |
          
|  |  |  |  |  |     |  |- java.util.TreeMap$Entry @ 0x77ec5fa70                         
                                                         |           40 |         1'000 |
     0.00%
|  |  |  |  |  |     |  |  '- org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @
0x78347fbc0                                              |           72 |           960 |
     0.00%
|  |  |  |  |  |     |  |     |- org.apache.lucene.util.fst.FST @ 0x788fe34c8            
                                                         |          104 |           840 |
     0.00%
|  |  |  |  |  |     |  |     |  |- org.apache.lucene.util.fst.FST$Arc[128] @ 0x7870932a0
                                                         |          528 |           528 |
     0.00%
|  |  |  |  |  |     |  |     |  |- org.apache.lucene.util.fst.BytesStore @ 0x77ec5fb60  
                                                         |           40 |           144 |
     0.00%
|  |  |  |  |  |     |  |     |  |  '- java.util.ArrayList @ 0x780663b28                 
                                                         |           24 |           104 |
     0.00%
|  |  |  |  |  |     |  |     |  |- org.apache.lucene.util.BytesRef @ 0x780663b10        
                                                         |           24 |            48 |
     0.00%
|  |  |  |  |  |     |  |     |  |  '- byte[5] @ 0x780663b58  .....                      
                                                         |           24 |            24 |
     0.00%
|  |  |  |  |  |     |  |     |  |- int[0] @ 0x780663af8                                 
                                                         |           16 |            16 |
     0.00%
|  |  |  |  |  |     |  |     |  '- Total: 4 entries                                     
                                                         |              |               |
          
|  |  |  |  |  |     |  |     |- org.apache.lucene.util.BytesRef @ 0x780663ae0           
                                                         |           24 |            48 |
     0.00%
|  |  |  |  |  |     |  |     '- Total: 2 entries                                        
                                                         |              |               |
          
|  |  |  |  |  |     |  |- org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x788820dd8
                                                |           72 |           960 |      0.00%
|  |  |  |  |  |     |  '- Total: 3 entries                                              
                                                         |              |               |
          
|  |  |  |  |  |     |- org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x788820d90
                                                   |           72 |         2'656 |      0.00%
|  |  |  |  |  |     '- Total: 3 entries                                                 
                                                         |              |               |
          
|  |  |  |  |  |- org.apache.lucene.codecs.lucene41.Lucene41PostingsReader @ 0x78274ab88 
                                                         |           32 |         4'032 |
     0.00%
|  |  |  |  |  |- org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput @ 0x788820d48   
                                                         |           72 |         1'680 |
     0.00%
|  |  |  |  |  '- Total: 3 entries                                                       
                                                         |              |               |
          
|  |  |  |  |- java.util.TreeMap @ 0x783902798                                           
                                                         |           48 |           368 |
     0.00%
|  |  |  |  |- java.util.HashMap @ 0x7839027c8                                           
                                                         |           48 |           232 |
     0.00%
|  |  |  |  '- Total: 3 entries                                                          
                                                         |              |               |
          
|  |  |  |- org.apache.lucene.index.SegmentCoreReaders$1 @ 0x78274aaa8                   
                                                         |           32 |        17'688 |
     0.00%
|  |  |  |- org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer @ 0x7822983c0    
                                                         |           48 |         6'504 |
     0.00%
|  |  |  |- org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$3 @ 0x7b1424f10  
                                                         |           24 |         3'456 |
     0.00%
|  |  |  |- org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader @ 0x7910e98c8
                                                      |           56 |         1'240 |   
  0.00%
|  |  |  |- org.apache.lucene.index.SegmentCoreReaders$3 @ 0x78274aae8                   
                                                         |           32 |           456 |
     0.00%
|  |  |  |- org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader @ 0x77fb743a0
                                                 |           40 |           344 |      0.00%
|  |  |  |- java.lang.String @ 0x78292d4c8  NIOFSIndexInput(path="/opt/webs/fust.ch/WEB-INF/indexes/1/fr_CH_1/fustusermanuals/full/__data/_n8.fdt")|
          32 |           256 |      0.00%
|  |  |  |- org.apache.lucene.index.SegmentCoreReaders$2 @ 0x78274aac8                   
                                                         |           32 |           240 |
     0.00%
|  |  |  |- java.util.Collections$SynchronizedSet @ 0x780661c68                          
                                                         |           24 |           216 |
     0.00%
|  |  |  |- sun.nio.ch.FileChannelImpl @ 0x782298420                                     
                                                         |           48 |           152 |
     0.00%
|  |  |  |- java.io.RandomAccessFile @ 0x782933780                                       
                                                         |           32 |            48 |
     0.00%
|  |  |  |- java.io.FileDescriptor @ 0x780b56148                                         
                                                         |           24 |            40 |
     0.00%
|  |  |  |- java.util.concurrent.atomic.AtomicInteger @ 0x780661c38                      
                                                         |           16 |            16 |
     0.00%
|  |  |  '- Total: 14 entries                                                            
                                                         |              |               |
          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Does this help?

-----Urspr√ľngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com] 
Gesendet: Freitag, 13. Juni 2014 13:15
An: Lucene Users
Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged

On Fri, Jun 13, 2014 at 3:02 AM, Clemens Wyss DEV <clemensdev@mysign.ch> wrote:
>> limit how many fields have norms enabled
> We have one index for approx 7000 pdfs (24GB). Of course no content is STOREd (but ANALYZEd).
This very index occupies 4GB on disk and the corresponding IndexReader is 60MB.
> Are norms per default enabled org.apache.lucene.document .TextField?

Yes.  Norms are a good idea for "large text fields", e.g. body text or a catch all field,
but usually not a good idea for tiny fields (e.g.
title).

>> use disk-based doc values not field cache
> How is this done?

Add XXXDocValuesField instead of e.g. StringField.

>> etc.
> such as? ;)

Upgrade to the upcoming Lucene 4.9; there have been some improvements e.g. to norms compression.
 You can tune your terms index settings, but terms index usually doesn't use much RAM.

You can fire up your up, get all searchers warmed, and take a heap dump and see what's using
RAM.  We can iterate from there.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message