lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-1867) CachedSQLentity processor is using unbounded hashmap
Date Thu, 08 Apr 2010 01:31:36 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854729#action_12854729
] 

Lance Norskog edited comment on SOLR-1867 at 4/8/10 1:30 AM:
-------------------------------------------------------------

Possibly it is from using ThreadLocal. All classes are in contrib/..../dataimport:

Context.java:

{noformat}
static final ThreadLocal<Context> CURRENT_CONTEXT = new ThreadLocal<Context>();
{noformat}

DocBuilder.buildDocument():

{noformat}
ContextImpl ctx = new ContextImpl(entity, vr, null,
    pk == null ? Context.FULL_DUMP : Context.DELTA_DUMP,
    session, parentCtx, this);
entityProcessor.init(ctx);
Context.CURRENT_CONTEXT.set(ctx);
{noformat}

If the CachedSqlEntityProcessor is saving rows in the Context, this may be the problem.

ThreadLocal is notorious for causing memory leaks because the thread gets reused in some way
but the code forgets to null out the local object.

The DIH needs to do Context.CURRENT_CONTEXT.set(null) before the request returns, if the DIH
index operation is synchronous. It probably should do this anyway for safety.

      was (Author: lancenorskog):
    Possibly it is from using ThreadLocal. All classes are in contrib/..../dataimport:

Context.java:
{{
static final ThreadLocal<Context> CURRENT_CONTEXT = new ThreadLocal<Context>();
}}
DocBuilder.buildDocument():
{{
    ContextImpl ctx = new ContextImpl(entity, vr, null,
            pk == null ? Context.FULL_DUMP : Context.DELTA_DUMP,
            session, parentCtx, this);
    entityProcessor.init(ctx);
    Context.CURRENT_CONTEXT.set(ctx);
}}

If the CachedSqlEntityProcessor is saving rows in the Context, this may be the problem.

ThreadLocal is notorious for causing memory leaks because the thread gets reused in some way
but the local variable is not set to null.

The DIH needs to do Context.CURRENT_CONTEXT.set(null) before the request returns, if the DIH
index operation is synchronous. It probably should do it anyway for safety.
  
> CachedSQLentity processor is using unbounded hashmap 
> -----------------------------------------------------
>
>                 Key: SOLR-1867
>                 URL: https://issues.apache.org/jira/browse/SOLR-1867
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: barani
>
> I am using cachedSqlEntityprocessor in DIH to index the data. Please find a sample dataconfig
structure, 
> <entity x query="select * from x"> ---> object 
> <entity y query="select * from y" processor="cachedSqlEntityprocessor" cachekey=y.id
cachevalue=x.id> --> object properties 
> For each and every object I would be retrieveing corresponding object properties (in
my subqueries). 
> I get in to OOM very often and I think thats a trade off if I use cachedSqlEntityprocessor.

> My assumption is that when I use cachedSqlEntityprocessor the indexing happens as follows,

> First entity x will get executed and the entire table gets stored in cache 
> next entity y gets executed and entire table gets stored in cache 
> Finally the compasion heppens through hash map . 
> So always I need to have the memory allocated to SOLR JVM more than or equal to the data
present in tables.
> One more issue is that even after SOLR completes indexing, the memory used previously
is not getting released. I could still see the JVM consuming 1.5 GB after the indexing completes.
I tried to use Java hotspot options but didnt see any differences.. GC is not getting invoked
even after a long time when using CachedSQLentity processor
> Main issue seem to be the fact that  the CachedSQLentity processor cache is an unbounded
HashMap, with no option to bound it. 
> Reference: http://n3.nabble.com/Need-info-on-CachedSQLentity-processor-tt698418.html#a698418

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message