lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rocco <alel...@gmail.com>
Subject Re: DIH import and postImportDeleteQuery
Date Wed, 25 May 2011 17:53:48 GMT
Hi Ephraim,

Thank you so much for the input.
I was able to find your thread on the archives and got your solution to
work.

In fact, when using $deleteDocById and $skipDoc it worked like a charm. This
feature is very useful, it's a shame it's not properly documented.
The only downside is the one you mentioned that the stats are not updated,
so if I update 13 documents and delete 2, DIH would tell me that only 13
documents were processed. This is bad in my case because I check the end
result to generate an error e-mail if needed.

You also mentioned that if the query contains only deletion records, a
commit would not be automatically executed and it would be necessary to
commit manually.

How can I commit manually via DIH? I was not able to find any references on
the documentation.

Thanks!
Alexandre

On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir <EphraimO@icq.com> wrote:

> Search the list for my post "DIH - deleting documents, high performance
> (delta) imports, and passing parameters" which shows my solution a
> similar problem.
>
> Ephraim Ofir
>
> -----Original Message-----
> From: Alexandre Rocco [mailto:aleloco@gmail.com]
> Sent: Tuesday, May 24, 2011 11:24 PM
> To: solr-user@lucene.apache.org
> Subject: DIH import and postImportDeleteQuery
>
> Guys,
>
> I am facing a situation in one of our projects that I need to perform a
> cleanup to remove some documents after we perform an update via DIH.
> The big issue right now comes from the fact that when we call the DIH
> with
> clean=false, the postImportDeleteQuery is not executed.
>
> My setup is currently arranged like this:
> - A SQL Server stored procedure that receives a parameter (specified in
> the
> URL) and returns the records to be indexed
> - The procedure is able to return all the records (for a full-import) or
> only the updated records (for a delta-import)
> - This procedure returns valid and deleted records, from this point
> comes
> the need to run a postImportDeleteQuery to remove the deleted ones.
>
> Everything works fine when I run a full-import, I am running always with
> clean=true, and then the whole index is rebuilt.
> When I need to do an incremental update, the records are updated
> correctly,
> but the command to delete the other records is not executed.
>
> I've tried several combinations, with different results:
> - Running full-import with clean=false: the records are updated but the
> ones
> that needs to be deleted stays on the index
> - Running delta-import with clean=false: the records are updated but the
> ones that needs to be deleted stays on the index
> - Running delta-import with clean=true: all records are deleted from the
> index and then only the records returned by the procedure are on the
> index,
> except the deleted ones.
>
> I don't see any way to achieve my goal, without changing the process
> that I
> do to obtain the data.
> Since this is a very complex stored procedure, with tons of joins and
> custom
> processing, I am trying everything to avoid messing with it.
>
> See below a copy of my data-config.xml file. I made it simpler omitting
> all
> the fields, since it's out of scope of the issue:
> <?xml version="1.0" encoding="UTF-8" ?>
> <dataConfig>
> <dataSource type="JdbcDataSource"
> driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
> password;responseBuffering=adaptive;"
>
> />
> <document>
> <entity name="entity_one"
> pk="entityid"
> transformer="RegexTransformer"
> query="EXEC some_stored_procedure ${dataimporter.request.someid}"
> preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
> >
> <field column="field1" name="field1" splitBy=";" />
> <field column="field2" name="field2" splitBy=";" />
> <field column="field3" name="field3" splitBy=";" />
> </entity>
>
> <entity name="entity_two"
> pk="entityid"
> transformer="RegexTransformer"
> query="EXEC someother_stored_procedure
> ${dataimporter.request.someotherid}"
> preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
> >
> <field column="field1" name="field1" />
> <field column="field2" name="field2" />
> <field column="field3" name="field2" />
> </entity>
> </document>
> </dataConfig>
>
> Any ideas or pointers that might help on this one?
>
> Many thanks,
> Alexandre
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message