lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: DIH on sequence (or any type that supports ordering) possible?
Date Sat, 06 Aug 2011 18:32:07 GMT
On 8/6/2011 8:49 AM, eks dev wrote:
> I would appreciate some clarifications about DIH
>
> I do not have reliable timestamp, but I do have atomic sequence that
> only grows on inserts/changes.

I use DIH, but I don't use the built-in timestamp facility at all.  I 
have an autoincrement field in a MySQL database that tells me what's 
new.  Here are the three queries I have defined in dih-config.xml:

       query="
         SELECT * FROM ${dataimporter.request.dataView}
         WHERE (
           (
             did &gt; ${dataimporter.request.minDid}
             AND did &lt;= ${dataimporter.request.maxDid}
           )
           ${dataimporter.request.extraWhere}
         ) AND (crc32(did) % ${dataimporter.request.numShards})
           IN (${dataimporter.request.modVal})
         "
       deltaImportQuery="
         SELECT * FROM ${dataimporter.request.dataView}
         WHERE (
           (
             did &gt; ${dataimporter.request.minDid}
             AND did &lt;= ${dataimporter.request.maxDid}
           )
           ${dataimporter.request.extraWhere}
         ) AND (crc32(did) % ${dataimporter.request.numShards})
           IN (${dataimporter.request.modVal})
         "
       deltaQuery="SELECT 1 AS did"

If you look carefully, you'll notice that query and deltaImportQuery are 
identical, and deltaQuery is just something that always returns a 
value.  I keep track of did (the primary key for both dih-config and the 
database) in my build system, passing in minDid and maxDid parameters on 
the DIH URL to tell it what to index.  I include more parameters to 
handle sharding and special situations.  I actually use a different 
field (with it's own unique MySQL index) as Solr's uniqueKey.

Currently Solr does not support keeping track of arbitrary data, just 
the current timestamp ... but if you can track it outside of Solr and 
pass the appropriate parameters in with the full-import or delta-import 
request, you can do almost anything.

This is on Solr 3.2, but I used a similar setup when I was running 1.4.1 
as well.

Shawn


Mime
View raw message