manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Web Connector and dates
Date Tue, 25 Jun 2013 13:05:32 GMT
Hi Stephane,

Web connector content does not in general include a date - it is not in the
content, and there is no way to generate it out of nothing.  Thus the Web
connector has no facility for processing dates, and does not attempt to do
anything with them even when the documents it is crawling were referenced
by an RSS feed.

The date for content indexed by the RSS connector comes, if present, from
fields in the RSS feed.  The dates are carried down from the feed to the
referenced content.  This is one specialization that makes the RSS
connector different from the more general Web connector.

As for your observation that you are seeing no dates at all in Solr, as
usual I must request that you include the Solr log info output for a
document that you think should have a date attached but doesn't.  This info
output shows all the arguments passed to Solr from ManifoldCF, and their
names.  It should be obvious what is going on if we can see one of those
lines.

Thanks,
Karl



On Tue, Jun 25, 2013 at 8:55 AM, Stephane Gamard <stephane@gamard.net>wrote:

> Hi All,
>
>
> I'm getting more and more confused with the datum of ingested content.
> Karl explained to me the (not yet documented) pudateiso metadata for RSS
> connector, and now I'm mixing it with content from web connector as well.
> My ingested content from the web connector has no date. I've did the
> following to make sure it would get something (tried multiple config):
>
>
>
> on my solr-output:
>
>
> And on my job:
>
> The ingested content have none of the datum fields (test and/or _date)
> populated. Is the web-connector abiding to the same rules as the file and
> other connectors as described here:
> https://issues.apache.org/jira/browse/CONNECTORS-657
>
>

Mime
View raw message