lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Implementing DIH - Using a non-datetime change tracking column to Identify delta
Date Wed, 05 Apr 2017 18:30:06 GMT
On 4/4/2017 7:40 AM, subinalex wrote:
> Can we use a non-datetime column to identify delta rows in deltaQuery for
> DIH configuration.
> Like for example in the below deltaQuery ,
>
>   deltaQuery="select ID from category where last_modified &gt;
> '${dih.last_index_time}'"
>
> the delta rows are picked when the last_modified datetime is greater than
> last index time.
>
> I want to pick the deltas if a column value differs from the corresponding
> column value in solr.
>
>  deltaQuery="select ID from category where md5hashcode  <> ;
> 'indexedmd5hashcode'"

The only piece of information that DIH saves internally when it starts
an import is the current timestamp.

You can still do what you want, but you will need to be responsible for
keeping track of the information necessary to determine what's new in
your own program.  Solr will not do it for you.

When you start an import, you can provide any arbitrary information with
URL parameters on the request that starts the import.  Here's my full
<entity> config for DIH from one of my Solr cores showing how to use
these parameters:

    <entity name="dataView" pk="did"
      query="
        SELECT * FROM ${dih.request.dataView}
        WHERE (
          (
            did &gt; ${dih.request.minDid}
            AND did &lt;= ${dih.request.maxDid}
          )
          ${dih.request.extraWhere}
        ) AND (crc32(did) % ${dih.request.numShards})
          IN (${dih.request.modVal})
        "
      deltaImportQuery="
        SELECT * FROM ${dih.request.dataView}
        WHERE (
          (
            did &gt; ${dih.request.minDid}
            AND did &lt;= ${dih.request.maxDid}
          )
          ${dih.request.extraWhere}
        ) AND (crc32(did) % ${dih.request.numShards})
          IN (${dih.request.modVal})
        "
      deltaQuery="SELECT 1 AS did"
    >

I am specifying many of the parts of the SQL query from URL parameters. 
For example, I will include a "dataView" parameter to choose at import
time what view or table will be queried.  The other parameters control
what ID values will be returned.

The query and deltaImportQuery attributes are identical.  At one time,
all my indexing was done with DIH, so I used these parameters to limit
what was done by the delta-import runs.  Currently, DIH is only used for
full rebuilds, I have a SolrJ program for incremental changes.

Thanks,
Shawn


Mime
View raw message