manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CONNECTORS-1573) Web Crawler exclude from index matches too much?
Date Thu, 24 Jan 2019 23:15:00 GMT

     [ https://issues.apache.org/jira/browse/CONNECTORS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wright resolved CONNECTORS-1573.
-------------------------------------
    Resolution: Not A Problem

> Web Crawler exclude from index matches too much?
> ------------------------------------------------
>
>                 Key: CONNECTORS-1573
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1573
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 2.10
>            Reporter: Korneel Staelens
>            Priority: Major
>
> Hello, 
> I'm not sure this is a bug, or my misinterpretation of the exclusion rules:
> I want to set-up a rule, so that it does NOT index a parentpage, but does index all childpages
of that parent:
> I'm setting up a rule: 
> Inclusions: 
> .*
>  
> Exclustions:
> [http://www.website.com/nl/]
> (I've tried also: http://www.website.com/nl/(\s)* )
> No dice, I'f I'm looking at the logs, I see the pages are crawled, but not indexed due
to job restriction. Is my rule wrong? Or is this a small bug?
>  
> Thanks for advice!
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message