lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Notarp <>
Subject delta index produces multiple results?
Date Tue, 06 Jan 2009 14:01:13 GMT

I use the DIH with RDBMS for indexing a large mysql database with  
about 7 mill. entries.
Full index is working fine, in schema.xml I implemented a uniqueKey  
field (which is of the type 'text').

I start queries with the dismax query handler, and get my results as  
an php array.

Now, since the database entries change every second, I use the delta  
query property to
a) delete documents from the index that have been deleted in the  
database (there´s a table for deleted items) and
b) update documents in the index that have changed since the last  
index (there´s a last_modified-column in a table for that).

 From my understanding, when I start a delta-import, the DIH checks  
the deletedPkQuery first and deletes the documents that should be  
deleted (identified by the uniqueKey-field?).
Seems to work - the catalina.out says "INFO: deleted from document to  
Solr: 1851010" for example.
Next thing would be the deltaQuery. This seems to work, too - when  
finished, a query returns the new database entries.
But (and here comes the problem):
The dataimport status always says "Added / Changed x-hundred  
documents, deleted 0 documents" -> no deletes?
Everytime I change an item in the database, and do a delta-import  
after that, my next query will return that item *twice*.
After the next change and next delta-import solr will return *three*  
result documents, and so on.
As I mentioned before, I get my search results as an array, consisting  
of many arrays (= solr documents) with the fields I set in schema.xml.
After changing some documents and delta-indexing them, I get lots of  
identical arrays (even the uniqueKey-field is absolutely identical).

I have read somewhere in the wiki, that an update is a delete of the  
old document plus a new document.
I guess the problem could be that something fails with the delete- 
process, but I don´t have a clue why.

Any ideas?

Thanks in advance

View raw message