lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul നോബിള്‍ नोब्ळ्" <noble.p...@gmail.com>
Subject Re: DIH Http input bug - problem with two-level RSS walker
Date Sun, 02 Nov 2008 02:43:36 GMT
If you wish to create 1 doc per inner entity the set
rootEntity="false" for the entity outer.
The exception is because the url is wrong

On Sat, Nov 1, 2008 at 10:30 AM, Lance Norskog <goksron@gmail.com> wrote:
> I wrote a nested HttpDataSource RSS poller. The outer loop reads an rss feed
> which contains N links to other rss feeds. The nested loop then reads each
> one of those to create documents. (Yes, this is an obnoxious thing to do.)
> Let's say the outer RSS feed gives 10 items. Both feeds use the same
> structure: /rss/channel with a <title> node and then N <item> nodes inside
> the channel. This should create two separate XML streams with two separate
> Xpath iterators, right?
>
> <entity name="outer" http stuff>
>    <field column="name" xpath="/rss/channel/title" />
>    <field column="url" xpath="/rss/channel/item/link"/>
>
>    <entity name="inner" http stuff url="${outer.url}" pk="title" >
>        <field column="title" xpath="/rss/channel/item/title" />
>    </entity>
> </entity>
>
> This does indeed walk each url from the outer feed and then fetch the inner
> rss feed. Bravo!
>
> However, I found two separate problems in xpath iteration. They may be
> related. The first problem is that it only stores the first document from
> each "inner" feed. Each feed has several documents with different title
> fields but it only grabs the first.
>
> The other is an off-by-one bug. The outer loop iterates through the 10 items
> and then tries to pull an 11th.  It then gives this exception trace:
>
> INFO: Created URL to:  [inner url]
> Oct 31, 2008 11:21:20 PM org.apache.solr.handler.dataimport.HttpDataSource
> getData
> SEVERE: Exception thrown while getting data
> java.net.MalformedURLException: no protocol: null/account.rss
>        at java.net.URL.<init>(URL.java:567)
>        at java.net.URL.<init>(URL.java:464)
>        at java.net.URL.<init>(URL.java:413)
>        at
> org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav
> a:90)
>        at
> org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav
> a:47)
>        at
> org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:18
> 3)
>        at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntit
> yProcessor.java:210)
>        at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEn
> tityProcessor.java:180)
>        at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityP
> rocessor.java:160)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
> 285)
>  ...
> Oct 31, 2008 11:21:20 PM org.apache.solr.handler.dataimport.DocBuilder
> buildDocument
> SEVERE: Exception while processing: album document :
> SolrInputDocumnt[{name=name(1.0)={Groups of stuff}}]
> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in
> invoking url null Processing Document # 11
>        at
> org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav
> a:115)
>        at
> org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav
> a:47)
>
>
>
>
>
>



-- 
--Noble Paul

Mime
View raw message