lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Kahwe Smith <...@pooteeweet.org>
Subject Re: Subclassing DIH
Date Tue, 01 Jun 2010 21:49:55 GMT

On 01.06.2010, at 23:35, Chris Hostetter wrote:

> 
> : http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780
> 
> yeah, i remember that thread -- it really seems like a driver issue, but 
> understandable that "fixing hte driver" is probably more out of scope then 
> "working arround in solr"
> 
> : I never did find a "good" solution to that bug however I did come up with a
> : workaround. I noticed if I removed my deletedPkQuery then the delta-import
> : would work as expected. Obviously I still have the need to delete items out
> : of the index during indexing so I wanted to subclass the DataImportHandler
> : to first update all documents then I would delete all the documents that my
> : deletedPkQuery would have deleted.
> 
> i'm not a DIH expert, but have you considered the possibility of having 
> two 
> distinct "entities" declared in your config, that both refer to the same 
> logical entity -- one that you use fo hte delta importing, and one that 
> you use for hte deletedPkQuery ?
> 
> I'm not sure if it would work, but based on another recent thread i saw, i 
> think it might...


to me the entire delta-query approach makes no sense, but i digress. here is a cut down version
of the config i use todo full imports, deletes and updates

<dataConfig>
    <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="${dataimporter.request.source_dsn}"
batchSize="-1" user="${dataimporter.request.user}" password="${dataimporter.request.password}"/>
    <document>
        <entity name="deletedentity" query="SELECT NULL" pk="id" deletedPkQuery="SELECT
e.id AS `$deleteDocById`
        FROM deletedentity AS e"/>
        <entity name="entity" query="SELECT
            e.id,  e.status, e.name
        FROM entity AS e
        WHERE ('${dataimporter.request.clear}' != 'false' OR e.updated_at > '${dataimporter.last_index_time}')"/>
    </document>
</dataConfig>

As you can see I have parameterized the DSN information. Plus I have one query defined for
the deletes and another one for both the full import and updates. if clear is set to anything
but false, the where condition evalutes to true and the updated_at would be ignored in pretty
much any decent RDBMS. if its false, then the updated_at is checked as per usual.

regards,
Lukas Kahwe Smith
mls@pooteeweet.org




Mime
View raw message