Hi Phil,

Based on our private back-and-forth, I've uploaded a patch to CONNECTORS-1309 which should address your problem.

Thanks!
Karl


On Tue, May 17, 2016 at 2:33 AM, Phil Riethmuller <priethmuller@funnelback.com> wrote:
Hi Karl, 

Thanks for clarifying – I’ll take another look.


Phil


From: Karl Wright <daddywri@gmail.com>
Reply-To: <user@manifoldcf.apache.org>
Date: Thursday, 12 May 2016 at 4:56 PM

To: "user@manifoldcf.apache.org" <user@manifoldcf.apache.org>
Subject: Re: Error: Bad list view url without site

Hi Phil,

The stack trace you provided before shows which one it is:

>>>>>>
          // Leave this in for the moment
          if (Logging.connectors.isDebugEnabled())
            Logging.connectors.debug("SharePoint: List: '"+urlPath+"', '"+title+"'");

          // If it has no view url, we don't have any idea what to do with it
          if (urlPath != null && urlPath.length() > 0)
          {
            // Normalize conditionally
            if (!urlPath.startsWith("/"))
              urlPath = prefixPath + urlPath;
            // Get rid of what we don't want, unconditionally
            if (urlPath.startsWith(prefixPath))
            {
              urlPath = urlPath.substring(prefixPath.length());
              // We're at the /Lists/listname part of the name.  Figure out where the end of it is.
              int index = urlPath.indexOf("/");
              if (index == -1)
                throw new ManifoldCFException("Bad list view url without site: '"+urlPath+"'");  // line 2524
<<<<<<

As you can see, there is no way the debug statement can be missed and nevertheless this exception gets thrown.

You turn on connector debug by putting this in your properties.xml:

<property name="org.apache.manifoldcf.connectors" value="DEBUG"/>

Thanks,
Karl


On Wed, May 11, 2016 at 11:24 PM, Phil Riethmuller <priethmuller@funnelback.com> wrote:
Hi Karl,

There doesn’t appear to be anything logged for the following statement:

>>>>>>
          // Leave this in for the moment
          if (Logging.connectors.isDebugEnabled())
            Logging.connectors.debug("SharePoint: List: '"+urlPath+"', '"+title+"'");
<<<<<<


I noticed there are 2 instances where “Bad list view url without Lists” is referenced in SPSProxyHelper.java, is it possible the error I’m getting in in reference to the one in the method “getListID” (as this doesn’t call this debug statement)?

Thanks
Phil

From: Phil Riethmuller <priethmuller@funnelback.com>
Reply-To: <user@manifoldcf.apache.org>
Date: Thursday, 5 May 2016 at 10:54 AM
To: <user@manifoldcf.apache.org>

Subject: Re: Error: Bad list view url without site

Thanks Karl,

I’ll add additional details to the JIRA ticket.

Phil

From: <user-return-4236-priethmuller=funnelback.com@manifoldcf.apache.org> on behalf of Karl Wright <daddywri@gmail.com>
Reply-To: <user@manifoldcf.apache.org>
Date: Wednesday, 4 May 2016 at 5:34 PM
To: "user@manifoldcf.apache.org" <user@manifoldcf.apache.org>
Subject: Re: Error: Bad list view url without site

The ticket is CONNECTORS-1309.

Karl


On Wed, May 4, 2016 at 3:32 AM, Karl Wright <daddywri@gmail.com> wrote:
Hi Phil,

The code is trying to extract the name of the list item from the URL here, and failing to see what it expects.  Here's the code:

>>>>>>
          // If it has no view url, we don't have any idea what to do with it
          if (urlPath != null && urlPath.length() > 0)
          {
            // Normalize conditionally
            if (!urlPath.startsWith("/"))
              urlPath = prefixPath + urlPath;
            // Get rid of what we don't want, unconditionally
            if (urlPath.startsWith(prefixPath))
            {
              urlPath = urlPath.substring(prefixPath.length());
              // We're at the /Lists/listname part of the name.  Figure out where the end of it is.
              int index = urlPath.indexOf("/");
              if (index == -1)
                throw new ManifoldCFException("Bad list view url without site: '"+urlPath+"'");
              String pathpart = urlPath.substring(0,index);

              if("Lists".equals(pathpart))
              {
                int k = urlPath.indexOf("/",index+1);
                if (k == -1)
                  throw new ManifoldCFException("Bad list view url without 'Lists': '"+urlPath+"'");
                pathpart = urlPath.substring(index+1,k);
              }

              if ( pathpart.length() != 0 && !pathpart.equals("_catalogs"))
              {
                if (title == null || title.length() == 0)
                  title = pathpart;
                result.add( new NameValue(pathpart, title) );
              }
<<<<<<

Basically, the URL field is coming back containing just "default.aspx", which does not have the expected prefix "/Lists/<listname>/..." on it, and that is confusing the parser.

What version of SharePoint are you crawling?  Also, if you can turn on connector debugging, I'd love to see the output of this debug statement:

>>>>>>
          // Leave this in for the moment
          if (Logging.connectors.isDebugEnabled())
            Logging.connectors.debug("SharePoint: List: '"+urlPath+"', '"+title+"'");
<<<<<<

Thanks,
Karl


On Tue, May 3, 2016 at 8:32 PM, Phil Riethmuller <priethmuller@funnelback.com> wrote:
Hi,

I'm using Manifold 2.3 using the single-process deployable war, and am trying to index a Sharepoint 2010 repository. I’m receiving the following error which is causing the crawl to fail:

ERROR 2016-04-29 10:50:25,985 (Worker thread '13') system.WorkerThread - Exception tossed: Bad list view url without site: 'default.aspx'

org.apache.manifoldcf.core.interfaces.ManifoldCFException: Bad list view url without site: 'default.aspx'

        at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getLists(SPSProxyHelper.java:2524)

        at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1587)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)


Are there any suggestions on the best approach to resolve this? 

Thanks,
Phil