manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1193) Consider adding feature to web connector to skip pages that match specified criteria
Date Thu, 30 Apr 2015 07:30:06 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521058#comment-14521058
] 

Karl Wright commented on CONNECTORS-1193:
-----------------------------------------

There are a number of outstanding questions, such as:
(1) Are keyword matches what we want, or the more-general regexps?
(2) If keywords, how many?
(3) If regexps, how many?
(4) Matching content only, or HTML tags as well?

As far as I understand it, matching the content of binary documents is *not* part of any proposed
feature.


> Consider adding feature to web connector to skip pages that match specified criteria
> ------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1193
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1193
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>    Affects Versions: ManifoldCF 1.10, ManifoldCF 2.2
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.10, ManifoldCF 2.2
>
>
> The user wants to skip content that matches specified criteria, because some sites don't
return a 404 code (for instance) but instead return 200 with a textual error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message