manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Getting a 401 Unauthorized on a SharePoint 2010 crawl request, with MCPermissions.asmx installed
Date Wed, 18 Sep 2013 20:31:22 GMT
Tried a crawl here, with the following rules:

site: "/"
library: "/*"
file: "/*"

Crawled 10 documents properly and completed, indexing 4 actual files.

I'm going to try lists, and if that works, merge the contents of
CONNECTORS-772 branch into trunk.

Karl




On Wed, Sep 18, 2013 at 2:56 PM, Karl Wright <daddywri@gmail.com> wrote:

> I forgot to mention: I removed the "4.0 AWS" selection.  Select just plain
> 4.0 instead.
>
> Karl
>
>
>
> On Wed, Sep 18, 2013 at 2:06 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Thanks.
>>
>> I committed a better fix.  You will need a clean job again though if you
>> want to try it.
>>
>> Karl
>>
>>
>>
>> On Wed, Sep 18, 2013 at 1:30 PM, Dmitry Goldenberg <
>> dgoldenberg@kmwllc.com> wrote:
>>
>>> Karl,
>>>
>>> Attaching the full log.
>>>
>>> - Dmitry
>>>
>>>
>>> On Wed, Sep 18, 2013 at 1:15 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Ok - is there a "Checking whether to include library" message in the
>>>> log?  If so, can you send that to me?
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 1:02 PM, Dmitry Goldenberg <
>>>> dgoldenberg@kmwllc.com> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> I'm definitely seeing this issue, after a full 'rejig' of the system:
>>>>> svn up, ant clean (actually blew away dist/example), ant build, re-created
>>>>> the connectors and and job.  Still seeing those string index out of bounds
>>>>> exceptions.
>>>>>
>>>>> - Dmitry
>>>>>
>>>>>
>>>>> On Wed, Sep 18, 2013 at 12:15 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>
>>>>>> Hi Dmitry,
>>>>>>
>>>>>> I think this is the same bug I fixed earlier today.  I think you just
>>>>>> have a job around from before the code change that fixed it.  If you can
>>>>>> create a new job and run that, see if you get the same issue.
>>>>>>
>>>>>> I'll be able to explore this more thoroughly when I get home tonight;
>>>>>> from here I cannot see your instance due to firewall.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 18, 2013 at 12:01 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>
>>>>>>> Not a regression; a bug I introduced.  Let me look at it - should be
>>>>>>> fixable shortly.
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 18, 2013 at 11:48 AM, Dmitry Goldenberg <
>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> I've just re-tested using the latest. I wonder if there's a
>>>>>>>> regression issue. Just crawling /Shared Documents of the root site, I'm
>>>>>>>> running into what seems like an indefinite loop of retrying to crawl that
>>>>>>>> directory, with the following error showing up time after time:
>>>>>>>>
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>> Checking whether to include document '/Shared
>>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>> File '/Shared Documents/test-word-doc-1.docx' exactly matched rule path
>>>>>>>> '/Shared Documents/*'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>>
>>>>>>>> FATAL 2013-09-18 11:42:25,004 (Worker thread '0') - Error tossed:
>>>>>>>> String index out of range: -1
>>>>>>>>
>>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>>> range: -1
>>>>>>>>
>>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>> Checking whether to include document '/Shared
>>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>> File '/Shared Documents/test-word-doc-1.docx' exactly matched rule path
>>>>>>>> '/Shared Documents/*'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>>
>>>>>>>> FATAL 2013-09-18 11:42:26,840 (Worker thread '2') - Error tossed:
>>>>>>>> String index out of range: -1
>>>>>>>>
>>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>>> range: -1
>>>>>>>>
>>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>> Checking whether to include document '/Shared
>>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>> File '/Shared Documents/test-word-doc-1.docx' exactly matched rule path
>>>>>>>> '/Shared Documents/*'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>>
>>>>>>>> FATAL 2013-09-18 11:42:26,865 (Worker thread '1') - Error tossed:
>>>>>>>> String index out of range: -1
>>>>>>>>
>>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>>> range: -1
>>>>>>>>
>>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>> Checking whether to include document '/Shared
>>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>> File '/Shared Documents/test-word-doc-1.docx' exactly matched rule path
>>>>>>>> '/Shared Documents/*'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>>
>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>>
>>>>>>>> FATAL 2013-09-18 11:42:26,895 (Worker thread '3') - Error tossed:
>>>>>>>> String index out of range: -1
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 18, 2013 at 11:27 AM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi Dmitry,
>>>>>>>>>
>>>>>>>>> It may be worth reviewing with that engineer what steps he took
>>>>>>>>> when he installed the instance.  If he used the standard installer, IIRC
>>>>>>>>> there are a number of ways you can mess this up - the primary way being if
>>>>>>>>> you try to install IIS afterwards and then just try to patch things up.
>>>>>>>>> The canned install usually does best if IIS is installed first.
>>>>>>>>>
>>>>>>>>> At any rate, I think that you have a probable case of "operator
>>>>>>>>> error" here...
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can think of a few possibilities.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Sep 18, 2013 at 11:16 AM, Dmitry Goldenberg <
>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>
>>>>>>>>>> SharePoint was not installed by a domain user (the Windows
>>>>>>>>>> instance is not on a domain).
>>>>>>>>>>
>>>>>>>>>> This is not a canned AWS SharePoint installation; an engineer on
>>>>>>>>>> the team installed it, using the standard installer program, I believe.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 18, 2013 at 10:34 AM, Will Parkinson <
>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Dmitry, do you know if Sharepoint was installed by a domain
>>>>>>>>>>> user?  I have heard of issues with Sharepoint if not installed using a
>>>>>>>>>>> domain user (e.g. DOMAIN\someuser)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 19, 2013 at 12:31 AM, Will Parkinson <
>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> No, i didnt have that issue.  The issue i had was the // and
>>>>>>>>>>>> /// references being added in the wrong places in the page URL's
>>>>>>>>>>>>
>>>>>>>>>>>> I was getting things like
>>>>>>>>>>>>
>>>>>>>>>>>>  /Site Name/Lib///rary/test.aspx
>>>>>>>>>>>>
>>>>>>>>>>>> My first set up was an out of the box set up, the main site was
>>>>>>>>>>>> on port 80, using classic authentication.  With the path modification in
>>>>>>>>>>>> the mcf-sharepoint-connector.jar, it worked very well.
>>>>>>>>>>>>
>>>>>>>>>>>> I set up active directory on that same server to authenticate
>>>>>>>>>>>> via NTLM
>>>>>>>>>>>>
>>>>>>>>>>>> The second server had the site on https on port 443, had claims
>>>>>>>>>>>> based authentication using ADFS and kerberos.  I had to modify the
>>>>>>>>>>>> mcf-sharepoint-connector.jar and MCPermissions.wsp to get this to work
>>>>>>>>>>>> around the lack of SID's returned from the permissions webservice.
>>>>>>>>>>>>
>>>>>>>>>>>> In this case, Active Directory and ADFS were set up on separate
>>>>>>>>>>>> AWS servers
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 19, 2013 at 12:23 AM, Karl Wright <
>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Will,
>>>>>>>>>>>>>
>>>>>>>>>>>>> The path stuff we're already dealing with - see the
>>>>>>>>>>>>> CONNECTORS-772 branch.  But what we are having trouble with is something
>>>>>>>>>>>>> much more fundamental.  On Dmitry's AWS instance, when you talk to the web
>>>>>>>>>>>>> services for a root site, it works fine.  But as soon as you add a subsite
>>>>>>>>>>>>> path into the URL, it *seems* to work fine, but actually behaves as though
>>>>>>>>>>>>> you never specified any subsite at all - it returns root site information
>>>>>>>>>>>>> only.  On this system, this occurs for ALL web services, even Microsoft's.
>>>>>>>>>>>>> The reason is that the value of SPContext.Current.Web never points to the
>>>>>>>>>>>>> subsite you specified.  The result is that you cannot use SharePoint
>>>>>>>>>>>>> subsites with ManifoldCF without causing havoc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does this sound completely unfamiliar to you?  If you never
>>>>>>>>>>>>> encountered it, then we should compare how these instances were set up,
>>>>>>>>>>>>> unless you have any further ideas.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:12 AM, Will Parkinson <
>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Karl (and Dmitry)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For AWS, i had to modify the way the the relPath in the in
>>>>>>>>>>>>>> the addFile function in the FileStream class (in SharepointRepository.java)
>>>>>>>>>>>>>> calculated the modifiedPath
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Essentially, i ensured that the relPath always contains the
>>>>>>>>>>>>>> site as part of the path
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               if (siteName != "") {
>>>>>>>>>>>>>>                     int siteInd = relPath.indexOf(siteName);
>>>>>>>>>>>>>>                     if (siteInd == -1 || siteInd > 3) {
>>>>>>>>>>>>>>                         relPath = siteName + relPath;
>>>>>>>>>>>>>>                     }
>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which fixed my pathing issue and the index out of bounds
>>>>>>>>>>>>>> errors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have also made many other modification to cope with AD and
>>>>>>>>>>>>>> claims based auth and compatibility with Sharepoint 2013
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dmitry, i have uploaded my modified
>>>>>>>>>>>>>> mcf-sharepoint-connector.jar and MCPermissions WSP if you would like to try
>>>>>>>>>>>>>> them out
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://pngnetworks.com/sharepoint-2010-claims.zip
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just make sure you back up your current ones as this is still
>>>>>>>>>>>>>> very much in development :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, the logging is very verbose.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Will
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:41 PM, Karl Wright <
>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Will,
>>>>>>>>>>>>>>> When you folks set up YOUR AWS instance, did it work with
>>>>>>>>>>>>>>> MCF out of the box?  Or did you need to do something?  And, if so, what did
>>>>>>>>>>>>>>> you do?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 9:28 AM, Will Parkinson <
>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes that's right, only really interested in the site that
>>>>>>>>>>>>>>>> you are trying to crawl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:25 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Will,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For SharePoint - 80, the output is
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> NTAuthenticationProviders       : (STRING) "NTLM"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I assume we're not interested in the Default Web Site; for
>>>>>>>>>>>>>>>>> that, the output is simply "The parameter NTAuthenticationProviders is not
>>>>>>>>>>>>>>>>> set at this node."
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 9:16 AM, Will Parkinson <
>>>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you open IIS manager and click on sites, it is
>>>>>>>>>>>>>>>>>> displayed in the ID column (see screenshot attached)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:55 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> **Hi Will,
>>>>>>>>>>>>>>>>>>> Sorry, what is the "sharepoint website *number*" in
>>>>>>>>>>>>>>>>>>> that invokation?
>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:53 AM, Will Parkinson <
>>>>>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Dmitry
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Just out of interest, what does the following command
>>>>>>>>>>>>>>>>>>>> output on your system
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> cd to C:\inetpub\adminscripts
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *cscript adsutil.vbs get w3svc/<put your sharepoint
>>>>>>>>>>>>>>>>>>>> website number here>/root/NTAuthenticationProviders*
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Will
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:44 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> "This is the second time I'm encountering the issue
>>>>>>>>>>>>>>>>>>>>> which leads me to believe it's a quirk of IIS and/or SharePoint."
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It cannot be just a quirk of SharePoint because
>>>>>>>>>>>>>>>>>>>>> SharePoint's UI etc could not create or work with subsites if that was
>>>>>>>>>>>>>>>>>>>>> true.  It may well be a configuration issue with IIS, which is indeed what
>>>>>>>>>>>>>>>>>>>>> I suspect.  I have pinged all the resources I know of to try and get some
>>>>>>>>>>>>>>>>>>>>> insight as to why this is happening.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> "Perhaps this is something that can be worked into the
>>>>>>>>>>>>>>>>>>>>> 'fabric' of ManifoldCF as a workaround for a known issue."
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Like I said before, this is a huge amount of work,
>>>>>>>>>>>>>>>>>>>>> tantamount to rewriting most of the connector.  If this is what you want to
>>>>>>>>>>>>>>>>>>>>> request, that is your option, but there is no way we'd complete any of this
>>>>>>>>>>>>>>>>>>>>> work before December/January at the earliest.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> "Just to understand this a bit better, the main
>>>>>>>>>>>>>>>>>>>>> breakage here is that the wildcards don't work properly, right? "
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> No, it means that ManifoldCF cannot get at any data of
>>>>>>>>>>>>>>>>>>>>> any kind associated with a SharePoint subsite.  Accessing root data works
>>>>>>>>>>>>>>>>>>>>> fine.  If you try to crawl as things are now, you must disable all subsites
>>>>>>>>>>>>>>>>>>>>> and just crawl the root site, or you will crawl the same things with longer
>>>>>>>>>>>>>>>>>>>>> and longer paths indefinitely.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:38 AM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This is the second time I'm encountering the issue
>>>>>>>>>>>>>>>>>>>>>> which leads me to believe it's a quirk of IIS and/or SharePoint. Perhaps
>>>>>>>>>>>>>>>>>>>>>> this is something that can be worked into the 'fabric' of ManifoldCF as a
>>>>>>>>>>>>>>>>>>>>>> workaround for a known issue. I understand that it may have far reaching
>>>>>>>>>>>>>>>>>>>>>> tenticles but I wonder if that's really the only option...
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Just to understand this a bit better, the main
>>>>>>>>>>>>>>>>>>>>>> breakage here is that the wildcards don't work properly, right?  In theory
>>>>>>>>>>>>>>>>>>>>>> if I have a repo connector config which lists specific library and list
>>>>>>>>>>>>>>>>>>>>>> paths, things should work?  It's only when the /* types of wildcards are
>>>>>>>>>>>>>>>>>>>>>> included, we're in trouble?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:07 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Someone else was having a similar problem. See
>>>>>>>>>>>>>>>>>>>>>>> http://social.technet.microsoft.com/Forums/sharepoint/en-US/e4b53c63-b89a-4356-a7b0-6ca7bfd22826/getting-sharepoint-subsite-from-custom-webservice.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Apparently it does depend on how you get to the web
>>>>>>>>>>>>>>>>>>>>>>> service, which does argue that it is an IIS issue.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 5:44 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> As discussed privately I had a look at your
>>>>>>>>>>>>>>>>>>>>>>>> system.  What is happening is that the C# static SPContext.Current.Web is
>>>>>>>>>>>>>>>>>>>>>>>> not reflecting the subsite in any url that contains a subsite.  In other
>>>>>>>>>>>>>>>>>>>>>>>> words, the URL coming in might be "
>>>>>>>>>>>>>>>>>>>>>>>> http://servername/subsite1/_vti_bin/MCPermissions.asmx",
>>>>>>>>>>>>>>>>>>>>>>>> but the MCPermissions.asmx plugin will think it is being executed in the
>>>>>>>>>>>>>>>>>>>>>>>> root context ("http://servername").  That's pretty
>>>>>>>>>>>>>>>>>>>>>>>> broken behavior, so I'm guessing that the problem is that either IIS or
>>>>>>>>>>>>>>>>>>>>>>>> SharePoint is somehow misconfigured to do this, and the web services would
>>>>>>>>>>>>>>>>>>>>>>>> then begin to work right again.  But I have no idea how this should
>>>>>>>>>>>>>>>>>>>>>>>> actually be fixed.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Will Parkinson, one of the subscribers of this
>>>>>>>>>>>>>>>>>>>>>>>> list, may find the symptoms meaningful, since he set up an AWS SharePoint
>>>>>>>>>>>>>>>>>>>>>>>> instance before.  I hope he will respond in a helpful way.  Until then, I
>>>>>>>>>>>>>>>>>>>>>>>> think we are stuck.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 9:49 AM, Dmitry Goldenberg
>>>>>>>>>>>>>>>>>>>>>>>> <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> It looks like I'll be able to get access for you
>>>>>>>>>>>>>>>>>>>>>>>>> to the test system we're using. Would you be interested in working with the
>>>>>>>>>>>>>>>>>>>>>>>>> system directly? I certainly don't mind doing some testing but I thought
>>>>>>>>>>>>>>>>>>>>>>>>> we'd speed things up this way. If so, could you email me from a more
>>>>>>>>>>>>>>>>>>>>>>>>> private account so we can set this up?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 7:38 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Another interesting bit from the log:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/lt/Forms/AllItems.aspx', 'List
>>>>>>>>>>>>>>>>>>>>>>>>>> Template Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/masterpage/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>>> 'Master Page Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/Shared Documents/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>>> 'Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/SiteAssets/Forms/AllItems.aspx', 'Site Assets'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/SitePages/Forms/AllPages.aspx', 'Site Pages'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/solutions/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>>> 'Solution Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/Style Library/Forms/AllItems.aspx', 'Style
>>>>>>>>>>>>>>>>>>>>>>>>>> Library'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/Test Library 1/Forms/AllItems.aspx', 'Test
>>>>>>>>>>>>>>>>>>>>>>>>>> Library 1'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/theme/Forms/AllItems.aspx', 'Theme
>>>>>>>>>>>>>>>>>>>>>>>>>> Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/wp/Forms/AllItems.aspx', 'Web Part
>>>>>>>>>>>>>>>>>>>>>>>>>> Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared
>>>>>>>>>>>>>>>>>>>>>>>>>> Documents' exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style
>>>>>>>>>>>>>>>>>>>>>>>>>> Library' exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> This time it appears that it is the Lists service
>>>>>>>>>>>>>>>>>>>>>>>>>> that is broken and does not recognize the parent site.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I haven't corrected this problem yet since now I
>>>>>>>>>>>>>>>>>>>>>>>>>> am beginning to wonder if *any* of the web services under Amazon work at
>>>>>>>>>>>>>>>>>>>>>>>>>> all for subsites.  We may be better off implementing everything we need in
>>>>>>>>>>>>>>>>>>>>>>>>>> the MCPermissions service.  I will ponder this as I continue to research
>>>>>>>>>>>>>>>>>>>>>>>>>> the logs.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> It's still valuable to check my getSites()
>>>>>>>>>>>>>>>>>>>>>>>>>> implementation.  I'll be doing another round of work tonight on the plugin.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 8:45 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The augmented plugin can be downloaded from
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://people.apache.org/~kwright/MetaCarta.SharePoint.MCPermissionsService.wsp.  The revised connector code is also ready, and should be checked out and
>>>>>>>>>>>>>>>>>>>>>>>>>>> built from
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-772.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Once you set it all up, you can see if it is
>>>>>>>>>>>>>>>>>>>>>>>>>>> doing the right thing by just trying to drill down through subsites in the
>>>>>>>>>>>>>>>>>>>>>>>>>>> UI.  You should always see a list of subsites that is appropriate for the
>>>>>>>>>>>>>>>>>>>>>>>>>>> context you are in; if this does not happen it is not working.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:45 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can see how preloading the list of subsites
>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be less optimal.. The advantage of doing it this way is one call and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> you've got the structure in memory, which may be OK unless there are sites
>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a ton of subsites which may stress out memory. The disadvantage is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> having to throw this structure around..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'll certainly help test out your changes,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> just let me know when they're available.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:19 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the code snippet.  I'd prefer,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> though, to not preload the entire site structure in memory.  Probably it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be better to just add another method to the ManifoldCF SharePoint
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2010 plugin.  More methods are going to be added anyway to support Claim
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Space Authentication, so I guess this would be just one more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We honestly have never seen this problem
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before - so it's not just flakiness, it has something to do with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> installation, I'm certain.  At any rate, I'll get going right away on a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround - if you are willing to test what I produce.  I'm also certain
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there is at least one other issue, but hopefully that will become clearer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> once this one is resolved.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 6:49 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> subsite discovery is effectively disabled
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> except directly under the root site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes. Come to think of it, I once came across
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this problem while implementing a SharePoint connector.  I'm not sure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> whether it's exactly what's happening with the issue we're discussing but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> looks like it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I started off by using multiple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getWebCollection calls to get child subsites of sites and trying to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> navigate down that way. The problem was that getWebCollection was always
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> returning the immediate subsites of the root site no matter whether you're
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the root or below, so I ended up generating infinite loops.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I switched over to using a single
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getAllSubWebCollection call and caching its results. That call returns the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> full list of all subsites as pairs of Title and Url.  I had a POJO similar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the one below which held the list of sites and contained logic for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enumerating the child sites, given the URL of a (parent) site.  From what I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recall, getWebCollection works inconsistently, either across SP versions or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> across installations, but the logic below should work in any case.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *** public class SubSiteCollection -- holds a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> list of CrawledSite pojo's each of which is a { title, url }.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *** SubSiteCollection has the following:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  public List<CrawledSite>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getImmediateSubSites(String siteUrl) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   List<CrawledSite> subSites = new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ArrayList<CrawledSite>();
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   for (CrawledSite site : sites) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    if (isChildOf(siteUrl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> site.getUrl().toString())) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     subSites.add(site);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return subSites;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  private static boolean isChildOf(String
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parentUrl, String urlToCheck) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   final String parent =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normalizeUrl(parentUrl);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   final String child =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normalizeUrl(urlToCheck);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   boolean ret = false;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   if (child.startsWith(parent)) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    String remainder =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> child.substring(parent.length());
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    ret =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StringUtils.countOccurrencesOf(remainder, SLASH) == 1;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return ret;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  private static String normalizeUrl(String
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return ((url.endsWith(SLASH)) ? url : url +
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SLASH).toLowerCase();
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:54 PM, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Have a look at this sequence also:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Abcd',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Defghij',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Klmnopqr',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd' exactly matched
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij' exactly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr' exactly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is using the GetSites(String parent)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method with a site name of "/Klmnopqr/Abcd/Abcd/Klmnopqr", and getting back
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> three sites (!!).  The parent path is not correct, obviously, but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nevertheless this one way in which paths are getting completely messed up.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It *looks* like the Webs web service is broken in such a way as to ignore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the URL coming in, except for the base part, which means that subsite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discovery is effectively disabled except directly under the root site.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This might still be OK if it is not possible
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to create subsites of subsites in this version of SharePoint.  Can you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confirm that this is or is not possible?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:42 PM, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "This is everything that got generated,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the very beginning"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Well, something isn't right.  What I expect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to see that I don't right up front are:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A webs "getWebCollection" invocation for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /_vti_bin/webs.asmx
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Two lists "getListCollection" invocations
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for /_vti_bin/lists.asmx
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Instead the first transactions I see are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from already busted URLs - which make no sense since there would be no way
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> they should have been able to get queued yet.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So there are a number of possibilities.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> First, maybe the log isn't getting cleared out, and the session in question
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> therefore starts somewhere in the middle of manifoldcf.log.1.  But no:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> C:\logs>grep "POST /_vti_bin/webs"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grep: input lines truncated - result
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questionable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nevertheless there are some interesting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> points here.  First, note the following response, which I've been able to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> determine is against "Test Library 1":
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlns=""><GetListItemsResult
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FileRef="SitePages/Home.aspx"/></GetListItemsResponse></GetListItems>'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - SharePoint: Checking whether to include document
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - SharePoint: File '/SitePages/Home.aspx' exactly matched rule
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - SharePoint: Including file '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  WARN 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - Sharepoint: Unexpected relPath structure; path is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/SitePages/Home.aspx', but expected <list/library> length of 26
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The FileRef in this case is pointing at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what, exactly?  Is there a SitePages/Home.aspx in the "Test Library 1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> library?  Or does it mean to refer back to the root site with this URL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construction?  And since this is supposedly at the root level, how come the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> combined site + library name comes out to 26??  I get 15, which leaves 11
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> characters unaccounted for.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still looking at the logs to see if I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can glean key information.  Later, if I could set up a crawl against the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sharepoint instance in question, that would certainly help.  I can readily
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set up an ssh tunnel if that is what is required.  But I won't be able to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do it until I get home tonight.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:58 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is everything that got generated,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the very beginning, meaning that I did a fresh build, new database,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new connection definitions, start. The log must have rolled but the .1 log
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is included.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If I were to get you access to the actual
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> test system, would you mind taking a look? It may be more efficient than
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sending logs..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:48 PM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These logs are different but have exactly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same problem; they start in the middle when the crawl is already well
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> underway.  I'm wondering if by chance you have more than one agents process
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> running or something?  Or maybe the log is rolling and stuff is getting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lost?  What's there is not what I would expect to see, at all.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I *did* manage to find two transactions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that look like they might be helpful, but because the *results* of those
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transactions are required by transactions that take place minutes *before*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the log, I have no confidence that I'm looking at anything meaningful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I'll get back to you on what I find nonetheless.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you decide repeat this exercise, try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> watching the log with "tail -f" before starting the job.  You should not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see any log contents at all until the job is started.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:11 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Attached please find logs which start at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the beginning. I started from a fresh build (clean db etc.), the logs start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at server start, then I create the output connection and the repo
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection, then the job, and then I fire off the job. I aborted the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution about a minute into it or so.  That's all that's in the logs with:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.manifoldcf.connectors=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log4j.logger.httpclient.wire.header=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log4j.logger.org.apache.commons.httpclient=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 12:39 PM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you sure these are the right logs?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They start right in the middle of a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They are already in a broken state
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when they start, e.g. the kinds of things that are being looked up are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already nonsense paths
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I need to see logs from the BEGINNING
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of a fresh crawl to see how the nonsense paths happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:52 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've generated logs with details as we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job was created afresh, as before:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The logs are attached.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:20 AM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Do you think that this issue is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> generic with regard to any Amz instance?"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I presume so, since you didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apparently do anything special to set one of these up.  Unfortunately, such
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instances are not part of the free tier, so I am still constrained from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setting one up for myself because of household rules here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "For now, I assume our only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround is to list the paths of interest manually"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Depending on what is going wrong,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that may not even work.  It looks like several SharePoint web service calls
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be affected, and not in a cleanly predictable way, for this to happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "is identification and extraction of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> attachments supported in the SP connector?"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF in general leaves
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> identification and extraction to the search engine.  Solr, for instance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses Tika for this, if so configured.  You can configure your Solr output
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection to include or exclude specific mime types or extensions if you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want to limit what is attempted.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:09 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, Karl. Do you think that this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue is generic with regard to any Amz instance? I'm just wondering how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easily reproducible this may be..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For now, I assume our only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround is to list the paths of interest manually, i.e. add explicit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rules for each library and list.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related subject - is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> identification and extraction of attachments supported in the SP
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector?  E.g. if I have a Word doc attached to a Task list item, would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that be extracted?  So far, I see that library content gets crawled and I'm
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getting the list item data but am not sure what happens to the attachments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:48 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the additional
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information.  It does appear like the method that lists subsites is not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working as expected under AWS.  Nor are some number of other methods which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supposedly just list the children of a subsite.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've reopened CONNECTORS-772 to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work on addressing this issue.  Please stay tuned.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:08 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Most of the paths that get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> generated are listed in the attached log, they match what shows up in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> diag report. So I'm not sure where they diverge, most of them just don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seem right.  There are 3 subsites rooted in the main site: Abcd, Defghij,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Klmnopqr.  It's strange that the connector would try such paths as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /*Klmnopqr*/*Defghij*/*Defghij*/Announcements///
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- there are multiple repetitions of the same subsite on the path and to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> begin with, Defghij is not a subsite of Klmnopqr, so why would it try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this? the /// at the end doesn't seem correct either, unless I'm missing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something in how this pathing works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Test Library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1/Financia/lProjectionsTemplate.xl/Abcd/Announcements -- looks wrong. A
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> docname is mixed into the path, a subsite ends up after a docname?...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Shared
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Documents/Personal_Fina/ncial_Statement_1_1.xl/Defghij/ -- same types of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues plus now somehow the docname got split with a forward slash?..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There are also a bunch of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StringIndexOutOfBoundsException's.  Perhaps this logic doesn't fit with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathing we're seeing on this amz-based installation?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect the logic to just know
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that root contains 3 subsites, and work off that. Each subsite has a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific list of libraries and lists, etc. It seems odd that the connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gets into this matching pattern, and tries what looks like thousands of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> variations (I aborted the execution).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:56 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To clarify, the way you would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to analyze this is to run a crawl with the wildcards as you have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> selected, abort if necessary after a while, and then use the Document
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Status report to list the document identifiers that had been generated.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Find a document identifier that you believe represents a path that is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> illegal, and figure out what SOAP getChild call caused the problem by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> returning incorrect data.  In other words, find the point in the path where
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the path diverges from what exists into what doesn't exist, and go back in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the ManifoldCF logs to find the particular SOAP request that led to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect from your description
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the problem lies with getting child sites given a site path, but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that's just a guess at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 6:40 PM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't understand what you mean
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by "I've tried the set of wildcards as below and I seem to be running into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a lot of cycles, where various subsite folders are appended to each other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and an extraction of data at all of those locations is attempted".   If you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are seeing cycles it means that document discovery is still failing in some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way.  For each folder/library/site/subsite, only the children of that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder/library/site/subsite should be appended to the path - ever.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you can give a specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example, preferably including the soap back-and-forth, that would be very
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 1:40 PM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Quick question. Is there an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy way to configure an SP repo connection for crawling of all content,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the root site all the way down?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've tried the set of wildcards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as below and I seem to be running into a lot of cycles, where various
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subsite folders are appended to each other and an extraction of data at all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of those locations is attempted. Ideally I'd like to avoid having to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construct an exact set of paths because the set may change, especially with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new content being added.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd also like to pull down any
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files attached to list items. I'm hoping that some type of "/* file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> include" should do it, once I figure out how to safely include all content.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message