lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Inger (JIRA)" <j...@apache.org>
Subject [jira] Created: (SOLR-1613) Segmentation of data imports (not just full or single record imports)
Date Mon, 30 Nov 2009 19:52:20 GMT
Segmentation of data imports (not just full or single record imports)
---------------------------------------------------------------------

                 Key: SOLR-1613
                 URL: https://issues.apache.org/jira/browse/SOLR-1613
             Project: Solr
          Issue Type: New Feature
          Components: contrib - DataImportHandler
    Affects Versions: 1.4
            Reporter: Matt Inger


It is desirable to able to segment imports by a particular field in the root entity record
so that you can update a particular segment of your database when bulk updates occur on the
backend database.  For instance, if a bulk update occurs for a particular customer, it would
be more efficient to be able to update a full segment of your index for that customer rather
than issuing updates for every single user in your index for that customer, or updating the
entire index.  That would be a waste of processing power.

Instead, it would be more efficient to specify that a particular document field in the root
entity was a segmentation field, and define an additional query on the root entity (i'm basing
my example on a jdbc based datasource):

<entity name="user" pk="userid" segment="customerid" ... 
             query="..." segmentQuery="select ... where customerid=${dataimporter.request.segment}"
/>

Then, when you request a segment update, you specify the segment as a parameter to your request

    /solr/db/dataimport?command=segment-import&segment=1000


I've worked out the code segments required to do this for the JdbcDataSource, though I'm not
sure what additional changes would be necessary for other datasource types, and am attaching
a patch which includes these changes.

             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message