lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>
Subject Re: DataImportHandler, XPathEntityProcessor, $hasMore, infinite loop
Date Thu, 30 Jul 2009 04:48:50 GMT
On Thu, Jul 30, 2009 at 1:23 AM, Erik Hatcher<erik@ehatchersolutions.com> wrote:
> I've been troubleshooting an issue where we're trying to load documents
> through DIH's URLDataSource and XPathEntityProcessor, where we want to
> leverage the $hasMore feature to request to a new URL.
>
> I've been tinkering with this using a very simple example, two XML files -
>
> solr.xml:
>  <add>
>    <doc>
>     <field name="id">SOLR1000</field>
>    </doc>
>    <doc>
>     <field name="id">**HASMORE**</field>
>    </doc>
>  </add>
>
> solr2.xml
>  <add>
>    <doc>
>      <field name="id">SOLR2k</field>
>    </doc>
>  </add>
>
> My DIH config is:
>
> <?xml version="1.0"?>
> <dataConfig>
>  <dataSource type="URLDataSource"
> baseUrl="file:///Users/erikhatcher/dev/solr/example/exampledocs/"
>             readTimeout="180000" connectionTimeout="60000"/>
>
>  <script>
>   <![CDATA[
>     function checkForMore(row, context) {
>       print("### checkForMore: " + row);
>       if (row.get('id') == '**HASMORE**') {
>         print("#### hasMore ####");
>         row.put('$hasMore', 'true');
>         row.put('$nextUrl',
> 'file:///Users/erikhatcher/dev/solr/example/exampledocs/solr2.xml');
>         row.put('$skipRow', 'true');
>       } else {
>         row.put('$hasMore', 'false');
>       }
>       return row;
>     }
>   ]]>
>  </script>
>
>  <document name="docs">
>   <entity name="doc"
>           processor="XPathEntityProcessor"
>           url="solr.xml"
>           forEach="/add/doc"
>           stream="true"
>
> transformer="DateFormatTransformer,TemplateTransformer,script:checkForMore"
>           onError="abort">
>     <field column="id" xpath="/add/doc/field[@name='id']"/>
>   </entity>
>  </document>
> </dataConfig>
>
> Without the else clause in checkForMore to set $hasMore to false, an
> infinite loop occurs and solr2.xml is requested repeatedly.  This is because
> once $hasMore is set on a row, XPathEntityProcess#readUsefulVars sets it in
> entity scope and it never gets unset.  Is this intentional?  Shouldn't
> $hasMore get reset after more is requested?

I would say we must reset it after using once.
>
> On a related note, it would seem useful to allow $hasMore/$skipRow/$nextUrl
> to be controlled from the XML data rather than solely from a transformer.
>  But $prefixed fields are ignored by DIH, right?
This is possible using a RegexTransformer (so you may not need to
write your own)

<field column="$hasMore" regex="HASMORE" replaceWith="true"/>


>
> I'm still looking for that holy grail of a good example leveraging
> $hasMore/$nextUrl!  :)
>
> Thanks,
>        Erik
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Mime
View raw message