manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Butz Joachim <Joachim.B...@hcsolutions.at>
Subject rdf:RDF not detected, "not valid feed" in DEBUG log
Date Thu, 23 Mar 2017 14:46:56 GMT
Hi,

I am using ManifoldCF 2.6.

The rss connector does not crawl the feed http://rss.orf.at/news.xml.
In manifoldcf.log the following line appears:
org.apache.manifoldcf.crawler.connectors.rss.RSSConnector$OuterContextClass DEBUG 2017-03-23
14:29:54,718 (Worker thread '1') - RSS: RSS document 'http://rss.orf.at/news.xml' does not
have rss, feed, or rdf:RDF tag - not valid feed

I tried the following change in RSSConnector (on branch release-2.6-branch) and now the feed
is crawled.
It is maybe a bug in the RSSConnector.

Kind Regards,
Joachim

--- a/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java
+++ b/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java
@@ -3311,7 +3311,7 @@ public class RSSConnector extends org.apache.manifoldcf.crawler.connectors.BaseR
           Logging.connectors.debug("RSS: Parsed bottom-level XML for RSS document '"+documentIdentifier+"'");
         return new RSSContextClass(theStream,namespace,localName,qName,atts,documentIdentifier,activities,filter);
       }
-      else if (localName.equals("RDF"))
+      else if (localName.toUpperCase().equals("RDF"))
       {
         // RDF/Atom feed detected
         outerTagCount++;
@@ -3345,7 +3345,7 @@ public class RSSConnector extends org.apache.manifoldcf.crawler.connectors.BaseR
       {
         rescanTimeSet = ((RSSContextClass)context).process();
       }
-      else if (tagName.equals("RDF"))
+      else if (tagName.toUpperCase().equals("RDF"))
       {
         rescanTimeSet = ((RDFContextClass)context).process();
       }

_______________________________________________

Dipl.-Ing. Joachim Butz
Softwareentwickler

HC SOLUTIONS GesmbH
A - 4030 Linz, Dauphinestraße 5
Telefon: +43 (0)732 / 9394 0
Mobil:
Fax:     +43 (0)732 / 9394 800
E-Mail:  Joachim.Butz@hcsolutions.at
Home:   http://www.hcsolutions.at/
            http://www.tomo-base.at/

Firmenbuchnummer: FN 115314 F
Firmenbuchgericht: Landesgericht Linz
Rechtsform: GesmbH
UID-Nr. ATU 36898407
_______________________________________________









Mime
View raw message