lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Solr performance issue
Date Fri, 16 Feb 2018 00:30:40 GMT
On 2/15/2018 2:00 AM, Srinivas Kashyap wrote:
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the child entities
in data-config.xml. And i'm using the same for full-import only. And in the beginning of my
implementation, i had written delta-import query to index the modified changes. But my requirement
grew and i have 17 child entities for a single parent entity now. When doing delta-import
for huge data, the number of requests being made to datasource(database)  became more and
CPU utilization was 100% when concurrent users started modifying the data. For this instead
of calling delta-import which imports based on last index time, I did full-import('SortedMapBackedCache'
) based on last index time.
>
> Though the parent entity query would return only records that are modified, the child
entity queries pull all the data from the database and the indexing happens 'in-memory' which
is causing the JVM memory go out of memory.

Can you provide your DIH config file (with passwords redacted) and the
precise URL you are using to initiate dataimport?  Also, I would like to
know what field you have defined as your uniqueKey.  I may have more
questions about the data in your system, depending on what I see.

That cache implementation should only cache entries from the database
that are actually requested.  If your query is correctly defined, it
should not pull all records from the DB table.

> Is there a way to specify in the child query entity to pull the record related to parent
entity in the full-import mode.

If I am understanding your question correctly, this is one of the fairly
basic things that DIH does.  Look at this config example in the
reference guide:

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#configuring-the-dih-configuration-file

In the entity named feature in that example config, the query string
uses ${item.ID} to reference the ID column from the parent entity, which
is item.

I should warn you that a cached entity does not always improve
performance.  This is particularly true if the lookup into the cache is
the information that goes to your uniqueKey field.  When the lookup is
by uniqueKey, every single row requested from the database will be used
exactly once, so there's not really any point to caching it.

Thanks,
Shawn


Mime
View raw message