lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Naas <>
Subject delta-import and cache (a story in conflict)
Date Tue, 14 May 2013 19:23:52 GMT
Thanks for all the great work on Solr. We have used it for over a year and have been very satisfied
with it.

However ,we have noticed that some of the recent changes have affected import caching in a
not so good way.  We are using Solr 4.2.0.

We use full and delta imports.  We only use a delta import query on the root entity (our object
model does not safely support updates to the nested entities).

Here is a snippet of the xml.

<entity name="product" pk="ID" query="..." deltaImportQuery="..." deltaQuery="..." deletedPkQuery="..."
 <field column="ID" name="id" />

<field column="NAME" name="name" />
   <entity name="productSize"
                    processor="CachedSqlEntityProcessor" cacheKey="PRODUCT_ID" cacheLookup="product.ID">
                <entity name="productSizeAttributes"
                        query="..." processor="CachedSqlEntityProcessor"  cacheKey="SIZE_ID"
                        logLevel="info" logTemplate="The size for product ${product.ID} is
                    <field column="SIZE_ID" name="size" />
                    <field column="SIZE_NAME" name="sizeName" />
                    <field column="SIZE_CODE" name="sizeCode"/>

We have noticed that delta imports that used to take 30 seconds now run indefinitely and eventually
cause an OutOfMemory condition on a huge multi GB Heap.  Here is the Stack Trace.

java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(
        at java.lang.AbstractStringBuilder.expandCapacity(
        at java.lang.AbstractStringBuilder.append(
        at java.lang.StringBuilder.append(
        at java.lang.StringBuilder.append(
        at java.util.AbstractCollection.toString(
        at java.lang.String.valueOf(
        at java.lang.StringBuilder.append(
        at org.apache.solr.common.SolrInputField.toString(
        at java.lang.String.valueOf(
        at java.lang.StringBuilder.append(
        at java.util.AbstractCollection.toString(
        at java.lang.String.valueOf(
        at java.lang.StringBuilder.append(
        at org.apache.solr.common.SolrInputDocument.toString(
        at java.lang.String.valueOf(
        at java.lang.StringBuilder.append(
        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(
        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(
        at org.apache.solr.handler.dataimport.DocBuilder.doDelta(
        at org.apache.solr.handler.dataimport.DocBuilder.execute(
        at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(
        at org.apache.solr.handler.dataimport.DataImporter.runCmd(
        at org.apache.solr.handler.dataimport.DataImporter$

DocBuilder.buildDocument line 354 in Solr 4.2.0:   SolrException.log(LOG, "Exception while
processing: " + epw.getEntity().getName() + " document : " + doc, e);

The doc.toString is appending all SolrInputFields to the string.  Why are the SolrInputFields
so big?

It is hard to say because the original exception is not logged.  After debugging for a few
days it appears that during a delta-import the cache is destroyed prematurely.

DocBuilder.buildDocument is called for each row returned by the deltaQuery.  In the finally
block of buildDocument it calls destroy on all EntityProcessorWrapper's.  This eventually
calls destroy on EntityProcessorBase which after destroying the cacheSupport, sets cacheSupport
to null.  For all other buildDocument calls, EntityProcessorBase.init() is eventually executed.
 This looks at the isFirstInit flag (which is false) and skips re-initializing the cache (which
likely should never have been destroyed except on the last row returned by the deltaQuery).

Finally when the rows for the nested entities are fetched, it skips the cache behavior, re-executes
the SQL and loads every single row form the nested entities as new fields in each document.

Thus if a query returned 100000 productSize records every product after the first would end
up with all 100000 productSizes attached to it.

This behavior makes delta-imports unusable when caching is utilized in any release after this
functionality was changed.

We have also noticed that caching does not seem to be honored when the SQL statement contains
resolvable tokens ${}.  However, we can workaround those 2 queries by disabling caching. 
I cannot disable caching on the other 20 queries.  Imports would take hours.

Has anyone else seen this?

Keith Naas

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message