lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-1920) Need generic placemarker for DIH delta-import
Date Tue, 13 Nov 2012 20:56:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496531#comment-13496531
] 

Shawn Heisey commented on SOLR-1920:
------------------------------------

In the MySQL database where my data originates, the field that I use for tracking what's new
is an autoincrement field, mapped to a tlong in Solr.  New documents added to the database
just get assigned the next autoincrement number.  If Solr could be informed that field X is
the tracking field, the highest value encountered during an import (according to that field's
sort mechanism) could be stored in dataimport.properties and re-used during the next delta-import.

If DIH is sufficiently disconnected from Solr schema internals (which actually seems likely),
you'd have to base your sort on the SQL data type, because it would have no way to know what
kind of field Solr has.

I currently do all delta tracking outside of Solr, so I'm already covered.  The generic idea
seemed worthy of opening an issue two years ago, because other people may run into situations
where they cannot use a timestamp for delta tracking.

I have no idea what kind of tracking problems you'd encounter when dealing with soft commits.
 Without a transaction log, that could get ugly. For performance reasons, I am initially deploying
4.x with no transaction log (see SOLR-3954).

                
> Need generic placemarker for DIH delta-import
> ---------------------------------------------
>
>                 Key: SOLR-1920
>                 URL: https://issues.apache.org/jira/browse/SOLR-1920
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>            Reporter: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.1
>
>
> The dataimporthandler currently is only capable of saving the index timestamp for later
use in delta-import commands.  It should be extended to allow any arbitrary data to be used
as a placemarker for the next import.
> It is possible to use externally supplied variables in data-config.xml and send values
in via the URL that starts the import, but if the config can support it natively, that is
better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message