lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Hatcher (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
Date Sat, 27 Jun 2009 01:07:47 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724780#action_12724780
] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

In all of those cases, as long as what is returned from the query and run through transformations
ends up with a key in the map that is the same as the uniqueKey setting in schema.xml, then
all is fine.  I still don't see a need to set pk="solr-id".  Won't there always be a uniqueKey-named
key in that map after transformations are applied?  uniqueKey definitely matters... that's
the field that must be used for deletions, and what I'm consistently seeing mentioned here
is that you want to duplicate that by saying pk="<uniqueKeyFieldName>", which is unnecessary
duplication.  When would you set pk to anything else?

It'll be later next week at the earliest, but I hope to get some unit tests contributed so
we can discuss this topic through tests rather than prose.

My use case is exactly the config at the top of this issue, where the uniqueKey value is a
templated transformation (because of multiple DIH configurations bringing in various data
sources, so the  unique key value must be fabricated to be guaranteed to be unique across
different datasources that may have the same primary keys in the databases) - this corresponds
to your #1.


> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same
value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are
removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition,
unique key of Solr document is the primary key of the top level entity".  This of course isn't
really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message