lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index
Date Tue, 07 Aug 2012 03:16:03 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429688#comment-13429688
] 

Robert Muir commented on SOLR-3684:
-----------------------------------

It does: I think the reuse is not the problem but the max?

By default i think it keeps min threads always (default 10), but our max of 10,000 allows
it to temporarily
spike huge (versus blocking). from looking at the jetty code, by default these will die off
after 60s, which is fine,
but we enrolled so many entries into e.g. Analyzer's or SegmentReader's CloseableThreadlocals,
that when they die off
and the CTL does a purge, its just a ton of garbage.

Really there isnt much benefit here in using so many threads at indexing time (dwpt's max
threads is 8, unless changed
in IndexWriterConfig, and this would have other bad side effects). At query time I think something
closer to jetty's
default of 254 would actually be better too.

But i looked at the history of this file, and it seems the reason it was set to 10,000 was
to prevent a deadlock (SOLR-683) ?
Is there a better solution to this now so that we can reduce this max?

Separately I've been fixing the analyzers that do hog ram because machines are getting more
cores, so I think its
worth it. But I think it would be nice if we can fix this max=10,000 
                
> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads 
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do
test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly.
After check the root cause, find the java process always do the full GC. 
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal
by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy,
that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer
will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those
object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one
thread.  For one thread will parse the document’s field step by step, so the same field
type can use the same reused component. While thread switches the same type’s field analyzes
only reset the same component input stream, it can save a lot of memory for same type’s
field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for
IndexSchema.java: 
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
> 	  
> 	private class SolrFieldReuseStrategy extends ReuseStrategy {
> 	      /**
> 	       * {@inheritDoc}
> 	       */
> 	      @SuppressWarnings("unchecked")
> 	      public TokenStreamComponents getReusableComponents(String fieldName) {
> 	        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer,
TokenStreamComponents>) getStoredValue();
> 	        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName))
: null;
> 	      }
> 	      /**
> 	       * {@inheritDoc}
> 	       */
> 	      @SuppressWarnings("unchecked")
> 	      public void setReusableComponents(String fieldName, TokenStreamComponents components)
{
> 	        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer,
TokenStreamComponents>) getStoredValue();
> 	        if (componentsPerField == null) {
> 	          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
> 	          setStoredValue(componentsPerField);
> 	        }
> 	        componentsPerField.put(analyzers.get(fieldName), components);
> 	      }
> 	}
> 	
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>     
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents
components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message