lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>
Subject Re: DataImportHandler, XPathEntityProcessor, $hasMore, infinite loop
Date Thu, 30 Jul 2009 04:50:38 GMT
2009/7/30 Noble Paul നോബിള്‍  नोब्ळ् <noble.paul@corp.aol.com>:
> On Thu, Jul 30, 2009 at 1:23 AM, Erik Hatcher<erik@ehatchersolutions.com> wrote:
>> I've been troubleshooting an issue where we're trying to load documents
>> through DIH's URLDataSource and XPathEntityProcessor, where we want to
>> leverage the $hasMore feature to request to a new URL.
>>
>> I've been tinkering with this using a very simple example, two XML files -
>>
>> solr.xml:
>>  <add>
>>    <doc>
>>     <field name="id">SOLR1000</field>
>>    </doc>
>>    <doc>
>>     <field name="id">**HASMORE**</field>
>>    </doc>
>>  </add>
>>
>> solr2.xml
>>  <add>
>>    <doc>
>>      <field name="id">SOLR2k</field>
>>    </doc>
>>  </add>
>>
>> My DIH config is:
>>
>> <?xml version="1.0"?>
>> <dataConfig>
>>  <dataSource type="URLDataSource"
>> baseUrl="file:///Users/erikhatcher/dev/solr/example/exampledocs/"
>>             readTimeout="180000" connectionTimeout="60000"/>
>>
>>  <script>
>>   <![CDATA[
>>     function checkForMore(row, context) {
>>       print("### checkForMore: " + row);
>>       if (row.get('id') == '**HASMORE**') {
>>         print("#### hasMore ####");
>>         row.put('$hasMore', 'true');
>>         row.put('$nextUrl',
>> 'file:///Users/erikhatcher/dev/solr/example/exampledocs/solr2.xml');
>>         row.put('$skipRow', 'true');
>>       } else {
>>         row.put('$hasMore', 'false');
>>       }
>>       return row;
>>     }
>>   ]]>
>>  </script>
>>
>>  <document name="docs">
>>   <entity name="doc"
>>           processor="XPathEntityProcessor"
>>           url="solr.xml"
>>           forEach="/add/doc"
>>           stream="true"
>>
>> transformer="DateFormatTransformer,TemplateTransformer,script:checkForMore"
>>           onError="abort">
>>     <field column="id" xpath="/add/doc/field[@name='id']"/>
>>   </entity>
>>  </document>
>> </dataConfig>
>>
>> Without the else clause in checkForMore to set $hasMore to false, an
>> infinite loop occurs and solr2.xml is requested repeatedly.  This is because
>> once $hasMore is set on a row, XPathEntityProcess#readUsefulVars sets it in
>> entity scope and it never gets unset.  Is this intentional?  Shouldn't
>> $hasMore get reset after more is requested?
>
> I would say we must reset it after using once.
>>
>> On a related note, it would seem useful to allow $hasMore/$skipRow/$nextUrl
>> to be controlled from the XML data rather than solely from a transformer.
>>  But $prefixed fields are ignored by DIH, right?
> This is possible using a RegexTransformer (so you may not need to
> write your own)
>
> <field column="$hasMore" regex="HASMORE" replaceWith="true"/>

a small correction

<field column="$hasMore" regex="HASMORE" replaceWith="true" sourceColName="id"/>


>
>
>>
>> I'm still looking for that holy grail of a good example leveraging
>> $hasMore/$nextUrl!  :)
>>
>> Thanks,
>>        Erik
>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Mime
View raw message