manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Field mapping for RSS feed
Date Tue, 02 Aug 2011 16:05:27 GMT
I just looked at the code.  It's not a bug rather than an oversight of
sorts.  The "description" or "content" fields are indexed as the
primary content of the document if the "chrome" mode is selected
accordingly.  If "None" is the "chrome" mode, then the item-level
description field is ignored even when present.

So I recommend simply adding a new kind of "description" field for
when the "chrome" mode is set to "None".  "item/description" may be
its name, or maybe the full XPath, your choice.  Propose something in
the ticket and I'll respond.

Thanks!
Karl


On Tue, Aug 2, 2011 at 11:47 AM, Karl Wright <daddywri@gmail.com> wrote:
> Hi Kate,
>
> The field mapping won't do the trick because the RSS connector is
> currently very selective about what fields it extracts - it by no
> means extracts all of them, so the ones that it *does* extract from
> the feed are "special".
>
> The behavior you describe sounds like a bug to me.  I'll go spelunking
> through the code at first opportunity.  In the meantime, could you
> create a Jira ticket describing the behavior you see vs. the behavior
> you want?
>
> Thanks!
> Karl
>
> On Tue, Aug 2, 2011 at 11:41 AM, K McGonigal <kmcgoniga@gmail.com> wrote:
>> Hi,
>>
>> I'm trying to use ManifoldCF to index an RSS feed into Solr.  It sort of
>> works, but my main problem at the moment is that the *channel* description
>> from the RSS feed is written to the "description" field in Solr when I would
>> really like the *item* description to be written instead.
>>
>> I have a typical RSS feed with the general structure:
>>
>> <rss>
>>     <channel>
>>         <title></title>
>>         <link></link>
>>         <description> *** the description I don't want *** </description>
>>         <item>
>>             <title></title>
>>             <link></link>
>>             <pubDate></pubDate>
>>             <description> *** the description I do want *** </description>
>>             <author></author>
>>             <category></category>
>>         </item>
>>     </channel>
>> </rss>
>>
>> I tried setting up the  field mapping on the job with the XPath address of
>> the second description, i.e. "/rss/channel/item/description" as the source,
>> but that did not work.
>>
>> I suspect I'm overlooking something simple, but I've spent 2 days trying to
>> solve it.  I would be grateful for any help.
>>
>>
>> Kate McGonigal
>>
>>
>>
>

Mime
View raw message