manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Kelleher <>
Subject Re: WEB: Illegal seed URL
Date Tue, 06 Dec 2011 21:31:40 GMT
The issue was my use of regexes in the inclusions list.  Oddly enough, 
some regexes I used (and verified via that should function properly, did not.

However, my crawl is functioning properly, and is only visiting the 
appropriate documents.


On 12/06/2011 02:34 PM, Karl Wright wrote:
> On second thought, "illegal seed" can also mean that the seed is
> excluded from the crawl due to your inclusion/exclusion regexp lists.
> Might want to check that out too.
> Karl
> On Tue, Dec 6, 2011 at 2:23 PM, Karl Wright<>  wrote:
>> The URL as stated is fine and is pretty standard.  I don't think
>> there's a problem there, unless you inadvertantly fixed something when
>> you changed the hostname.
>> Can you look at the log - there may well be a stack trace, especially
>> if you have<property name="org.apache.manifoldcf.connectors"
>> value="DEBUG"/>  set.  I'd love to see what the trace is.
>> Karl
>> On Tue, Dec 6, 2011 at 1:52 PM, Michael Kelleher<>  wrote:
>>> Here is my seed URL (minus the hostname):
>>> I am using a Web Crawler connection that has been tested with the
>>> NullOutputConnector - so I dont think the issue can be here
>>> I am also using the Solr Output Connector - this had been throwing an
>>> Exception till I fixed the core name - this is the first time I have used
>>> this.  So, maybe I dont have things configured correct here.  However, there
>>> are no exceptions in the log.  Also, I am not using authentication at all on
>>> Solr.
>>> I looked at the class:
>>> connectors\webcrawler\connector\src\main\java\org\apache\manifoldcf\crawler\connectors\webcrawler\
>>> and it was not Obvious what the issue is.
>>> Also, in logging.ini - I changed the logging level to DEBUG and restarted
>>> before I tested the crawl, which further obscures the logic to me in
>>> Is there somewhere else I can set logging levels.  I am not sure my change
>>> to logging.ini is having any effect.  Also, is there some other test you
>>> might suggest?
>>> thanks.
>>> --mike

View raw message