manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Getting a 401 Unauthorized on a SharePoint 2010 crawl request, with MCPermissions.asmx installed
Date Thu, 19 Sep 2013 01:24:45 GMT
Good news!

I saw those warnings in the event logs also, but since Windows is a noisy
beast anyhow, I didn't bring them to your attention.  Sorry!

Karl



On Wed, Sep 18, 2013 at 9:19 PM, Dmitry Goldenberg
<dgoldenberg@kmwllc.com>wrote:

> Hi Karl and Will,
>
> I think I may have identified the issue with our test SP instance:
> http://technet.microsoft.com/en-us/library/cc288609.aspx.
>
> It sounds like we need to configure alternative access mappings.  The
> article says that "Your alternate access mappings might not be configured
> correctly if you are experiencing any of the following problems, .... e.g.
> You are redirected to http://*computer_name* when browsing to your
> site..." then "If the request is from a URL that has not been configured
> for alternate access mappings, Windows SharePoint Services 3.0 also creates
> a critical error in the Windows event log, and in the Windows SharePoint
> Services ULS logs."
>
> I'm in fact seeing warnings like that: Alternate access mappings have not
> been configured.
> Users or services are accessing the site http://amazona-****** with the
> URL http://********.amazonaws.com.
>
> I'll see about configuring this stuff and retesting.
> - Dmitry
>
>
> On Wed, Sep 18, 2013 at 4:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Dmitry,
>>
>> Branch has now been merged, and ticket CONNECTORS-772 has now been
>> resolved (again).  CONNECTORS-777 also resolved.
>>
>> I can now start working on CONNECTORS-778, while you explore installing a
>> new instance in a rigorous, repeatable manner.
>>
>> Thanks!
>> Karl
>>
>>
>>
>> On Wed, Sep 18, 2013 at 4:31 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Tried a crawl here, with the following rules:
>>>
>>> site: "/"
>>> library: "/*"
>>> file: "/*"
>>>
>>> Crawled 10 documents properly and completed, indexing 4 actual files.
>>>
>>> I'm going to try lists, and if that works, merge the contents of
>>> CONNECTORS-772 branch into trunk.
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>> On Wed, Sep 18, 2013 at 2:56 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> I forgot to mention: I removed the "4.0 AWS" selection.  Select just
>>>> plain 4.0 instead.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 2:06 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>
>>>>> Thanks.
>>>>>
>>>>> I committed a better fix.  You will need a clean job again though if
>>>>> you want to try it.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 18, 2013 at 1:30 PM, Dmitry Goldenberg <
>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>
>>>>>> Karl,
>>>>>>
>>>>>> Attaching the full log.
>>>>>>
>>>>>> - Dmitry
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 18, 2013 at 1:15 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>
>>>>>>> Ok - is there a "Checking whether to include library" message in the
>>>>>>> log?  If so, can you send that to me?
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 18, 2013 at 1:02 PM, Dmitry Goldenberg <
>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> I'm definitely seeing this issue, after a full 'rejig' of the
>>>>>>>> system: svn up, ant clean (actually blew away dist/example), ant build,
>>>>>>>> re-created the connectors and and job.  Still seeing those string index out
>>>>>>>> of bounds exceptions.
>>>>>>>>
>>>>>>>> - Dmitry
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 18, 2013 at 12:15 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi Dmitry,
>>>>>>>>>
>>>>>>>>> I think this is the same bug I fixed earlier today.  I think you
>>>>>>>>> just have a job around from before the code change that fixed it.  If you
>>>>>>>>> can create a new job and run that, see if you get the same issue.
>>>>>>>>>
>>>>>>>>> I'll be able to explore this more thoroughly when I get home
>>>>>>>>> tonight; from here I cannot see your instance due to firewall.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Sep 18, 2013 at 12:01 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Not a regression; a bug I introduced.  Let me look at it - should
>>>>>>>>>> be fixable shortly.
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 18, 2013 at 11:48 AM, Dmitry Goldenberg <
>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>
>>>>>>>>>>> I've just re-tested using the latest. I wonder if there's a
>>>>>>>>>>> regression issue. Just crawling /Shared Documents of the root site, I'm
>>>>>>>>>>> running into what seems like an indefinite loop of retrying to crawl that
>>>>>>>>>>> directory, with the following error showing up time after time:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>>>>> Checking whether to include document '/Shared
>>>>>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>>>>> File '/Shared Documents/test-word-doc-1.docx' exactly matched rule path
>>>>>>>>>>> '/Shared Documents/*'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>>>>>
>>>>>>>>>>> FATAL 2013-09-18 11:42:25,004 (Worker thread '0') - Error
>>>>>>>>>>> tossed: String index out of range: -1
>>>>>>>>>>>
>>>>>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>>>>>> range: -1
>>>>>>>>>>>
>>>>>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>>>>> Checking whether to include document '/Shared
>>>>>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>>>>> File '/Shared Documents/test-word-doc-1.docx' exactly matched rule path
>>>>>>>>>>> '/Shared Documents/*'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>>>>>
>>>>>>>>>>> FATAL 2013-09-18 11:42:26,840 (Worker thread '2') - Error
>>>>>>>>>>> tossed: String index out of range: -1
>>>>>>>>>>>
>>>>>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>>>>>> range: -1
>>>>>>>>>>>
>>>>>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>>>>> Checking whether to include document '/Shared
>>>>>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>>>>> File '/Shared Documents/test-word-doc-1.docx' exactly matched rule path
>>>>>>>>>>> '/Shared Documents/*'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>>>>>
>>>>>>>>>>> FATAL 2013-09-18 11:42:26,865 (Worker thread '1') - Error
>>>>>>>>>>> tossed: String index out of range: -1
>>>>>>>>>>>
>>>>>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>>>>>> range: -1
>>>>>>>>>>>
>>>>>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>>>>> Checking whether to include document '/Shared
>>>>>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>>>>> File '/Shared Documents/test-word-doc-1.docx' exactly matched rule path
>>>>>>>>>>> '/Shared Documents/*'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>>>>>
>>>>>>>>>>> FATAL 2013-09-18 11:42:26,895 (Worker thread '3') - Error
>>>>>>>>>>> tossed: String index out of range: -1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:27 AM, Karl Wright <
>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>
>>>>>>>>>>>> It may be worth reviewing with that engineer what steps he took
>>>>>>>>>>>> when he installed the instance.  If he used the standard installer, IIRC
>>>>>>>>>>>> there are a number of ways you can mess this up - the primary way being if
>>>>>>>>>>>> you try to install IIS afterwards and then just try to patch things up.
>>>>>>>>>>>> The canned install usually does best if IIS is installed first.
>>>>>>>>>>>>
>>>>>>>>>>>> At any rate, I think that you have a probable case of "operator
>>>>>>>>>>>> error" here...
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I can think of a few possibilities.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:16 AM, Dmitry Goldenberg <
>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> SharePoint was not installed by a domain user (the Windows
>>>>>>>>>>>>> instance is not on a domain).
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is not a canned AWS SharePoint installation; an engineer
>>>>>>>>>>>>> on the team installed it, using the standard installer program, I believe.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:34 AM, Will Parkinson <
>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dmitry, do you know if Sharepoint was installed by a domain
>>>>>>>>>>>>>> user?  I have heard of issues with Sharepoint if not installed using a
>>>>>>>>>>>>>> domain user (e.g. DOMAIN\someuser)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Sep 19, 2013 at 12:31 AM, Will Parkinson <
>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, i didnt have that issue.  The issue i had was the // and
>>>>>>>>>>>>>>> /// references being added in the wrong places in the page URL's
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was getting things like
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  /Site Name/Lib///rary/test.aspx
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My first set up was an out of the box set up, the main site
>>>>>>>>>>>>>>> was on port 80, using classic authentication.  With the path modification
>>>>>>>>>>>>>>> in the mcf-sharepoint-connector.jar, it worked very well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I set up active directory on that same server to
>>>>>>>>>>>>>>> authenticate via NTLM
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The second server had the site on https on port 443, had
>>>>>>>>>>>>>>> claims based authentication using ADFS and kerberos.  I had to modify the
>>>>>>>>>>>>>>> mcf-sharepoint-connector.jar and MCPermissions.wsp to get this to work
>>>>>>>>>>>>>>> around the lack of SID's returned from the permissions webservice.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In this case, Active Directory and ADFS were set up on
>>>>>>>>>>>>>>> separate AWS servers
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Sep 19, 2013 at 12:23 AM, Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Will,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The path stuff we're already dealing with - see the
>>>>>>>>>>>>>>>> CONNECTORS-772 branch.  But what we are having trouble with is something
>>>>>>>>>>>>>>>> much more fundamental.  On Dmitry's AWS instance, when you talk to the web
>>>>>>>>>>>>>>>> services for a root site, it works fine.  But as soon as you add a subsite
>>>>>>>>>>>>>>>> path into the URL, it *seems* to work fine, but actually behaves as though
>>>>>>>>>>>>>>>> you never specified any subsite at all - it returns root site information
>>>>>>>>>>>>>>>> only.  On this system, this occurs for ALL web services, even Microsoft's.
>>>>>>>>>>>>>>>> The reason is that the value of SPContext.Current.Web never points to the
>>>>>>>>>>>>>>>> subsite you specified.  The result is that you cannot use SharePoint
>>>>>>>>>>>>>>>> subsites with ManifoldCF without causing havoc.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does this sound completely unfamiliar to you?  If you never
>>>>>>>>>>>>>>>> encountered it, then we should compare how these instances were set up,
>>>>>>>>>>>>>>>> unless you have any further ideas.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:12 AM, Will Parkinson <
>>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hey Karl (and Dmitry)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For AWS, i had to modify the way the the relPath in the in
>>>>>>>>>>>>>>>>> the addFile function in the FileStream class (in SharepointRepository.java)
>>>>>>>>>>>>>>>>> calculated the modifiedPath
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Essentially, i ensured that the relPath always contains
>>>>>>>>>>>>>>>>> the site as part of the path
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>               if (siteName != "") {
>>>>>>>>>>>>>>>>>                     int siteInd =
>>>>>>>>>>>>>>>>> relPath.indexOf(siteName);
>>>>>>>>>>>>>>>>>                     if (siteInd == -1 || siteInd > 3) {
>>>>>>>>>>>>>>>>>                         relPath = siteName + relPath;
>>>>>>>>>>>>>>>>>                     }
>>>>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Which fixed my pathing issue and the index out of bounds
>>>>>>>>>>>>>>>>> errors.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have also made many other modification to cope with AD
>>>>>>>>>>>>>>>>> and claims based auth and compatibility with Sharepoint 2013
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Dmitry, i have uploaded my modified
>>>>>>>>>>>>>>>>> mcf-sharepoint-connector.jar and MCPermissions WSP if you would like to try
>>>>>>>>>>>>>>>>> them out
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://pngnetworks.com/sharepoint-2010-claims.zip
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Just make sure you back up your current ones as this is
>>>>>>>>>>>>>>>>> still very much in development :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Also, the logging is very verbose.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Will
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:41 PM, Karl Wright <
>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Will,
>>>>>>>>>>>>>>>>>> When you folks set up YOUR AWS instance, did it work with
>>>>>>>>>>>>>>>>>> MCF out of the box?  Or did you need to do something?  And, if so, what did
>>>>>>>>>>>>>>>>>> you do?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 9:28 AM, Will Parkinson <
>>>>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes that's right, only really interested in the site
>>>>>>>>>>>>>>>>>>> that you are trying to crawl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:25 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Will,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For SharePoint - 80, the output is
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> NTAuthenticationProviders       : (STRING) "NTLM"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I assume we're not interested in the Default Web Site;
>>>>>>>>>>>>>>>>>>>> for that, the output is simply "The parameter NTAuthenticationProviders is
>>>>>>>>>>>>>>>>>>>> not set at this node."
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 9:16 AM, Will Parkinson <
>>>>>>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If you open IIS manager and click on sites, it is
>>>>>>>>>>>>>>>>>>>>> displayed in the ID column (see screenshot attached)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:55 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> **Hi Will,
>>>>>>>>>>>>>>>>>>>>>> Sorry, what is the "sharepoint website *number*" in
>>>>>>>>>>>>>>>>>>>>>> that invokation?
>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:53 AM, Will Parkinson <
>>>>>>>>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Just out of interest, what does the following
>>>>>>>>>>>>>>>>>>>>>>> command output on your system
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> cd to C:\inetpub\adminscripts
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> *cscript adsutil.vbs get w3svc/<put your sharepoint
>>>>>>>>>>>>>>>>>>>>>>> website number here>/root/NTAuthenticationProviders*
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Will
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:44 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "This is the second time I'm encountering the issue
>>>>>>>>>>>>>>>>>>>>>>>> which leads me to believe it's a quirk of IIS and/or SharePoint."
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It cannot be just a quirk of SharePoint because
>>>>>>>>>>>>>>>>>>>>>>>> SharePoint's UI etc could not create or work with subsites if that was
>>>>>>>>>>>>>>>>>>>>>>>> true.  It may well be a configuration issue with IIS, which is indeed what
>>>>>>>>>>>>>>>>>>>>>>>> I suspect.  I have pinged all the resources I know of to try and get some
>>>>>>>>>>>>>>>>>>>>>>>> insight as to why this is happening.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "Perhaps this is something that can be worked into
>>>>>>>>>>>>>>>>>>>>>>>> the 'fabric' of ManifoldCF as a workaround for a known issue."
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Like I said before, this is a huge amount of work,
>>>>>>>>>>>>>>>>>>>>>>>> tantamount to rewriting most of the connector.  If this is what you want to
>>>>>>>>>>>>>>>>>>>>>>>> request, that is your option, but there is no way we'd complete any of this
>>>>>>>>>>>>>>>>>>>>>>>> work before December/January at the earliest.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "Just to understand this a bit better, the main
>>>>>>>>>>>>>>>>>>>>>>>> breakage here is that the wildcards don't work properly, right? "
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> No, it means that ManifoldCF cannot get at any data
>>>>>>>>>>>>>>>>>>>>>>>> of any kind associated with a SharePoint subsite.  Accessing root data
>>>>>>>>>>>>>>>>>>>>>>>> works fine.  If you try to crawl as things are now, you must disable all
>>>>>>>>>>>>>>>>>>>>>>>> subsites and just crawl the root site, or you will crawl the same things
>>>>>>>>>>>>>>>>>>>>>>>> with longer and longer paths indefinitely.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:38 AM, Dmitry Goldenberg
>>>>>>>>>>>>>>>>>>>>>>>> <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> This is the second time I'm encountering the issue
>>>>>>>>>>>>>>>>>>>>>>>>> which leads me to believe it's a quirk of IIS and/or SharePoint. Perhaps
>>>>>>>>>>>>>>>>>>>>>>>>> this is something that can be worked into the 'fabric' of ManifoldCF as a
>>>>>>>>>>>>>>>>>>>>>>>>> workaround for a known issue. I understand that it may have far reaching
>>>>>>>>>>>>>>>>>>>>>>>>> tenticles but I wonder if that's really the only option...
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Just to understand this a bit better, the main
>>>>>>>>>>>>>>>>>>>>>>>>> breakage here is that the wildcards don't work properly, right?  In theory
>>>>>>>>>>>>>>>>>>>>>>>>> if I have a repo connector config which lists specific library and list
>>>>>>>>>>>>>>>>>>>>>>>>> paths, things should work?  It's only when the /* types of wildcards are
>>>>>>>>>>>>>>>>>>>>>>>>> included, we're in trouble?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:07 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Someone else was having a similar problem. See
>>>>>>>>>>>>>>>>>>>>>>>>>> http://social.technet.microsoft.com/Forums/sharepoint/en-US/e4b53c63-b89a-4356-a7b0-6ca7bfd22826/getting-sharepoint-subsite-from-custom-webservice.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Apparently it does depend on how you get to the
>>>>>>>>>>>>>>>>>>>>>>>>>> web service, which does argue that it is an IIS issue.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 5:44 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> As discussed privately I had a look at your
>>>>>>>>>>>>>>>>>>>>>>>>>>> system.  What is happening is that the C# static SPContext.Current.Web is
>>>>>>>>>>>>>>>>>>>>>>>>>>> not reflecting the subsite in any url that contains a subsite.  In other
>>>>>>>>>>>>>>>>>>>>>>>>>>> words, the URL coming in might be "
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://servername/subsite1/_vti_bin/MCPermissions.asmx",
>>>>>>>>>>>>>>>>>>>>>>>>>>> but the MCPermissions.asmx plugin will think it is being executed in the
>>>>>>>>>>>>>>>>>>>>>>>>>>> root context ("http://servername").  That's
>>>>>>>>>>>>>>>>>>>>>>>>>>> pretty broken behavior, so I'm guessing that the problem is that either IIS
>>>>>>>>>>>>>>>>>>>>>>>>>>> or SharePoint is somehow misconfigured to do this, and the web services
>>>>>>>>>>>>>>>>>>>>>>>>>>> would then begin to work right again.  But I have no idea how this should
>>>>>>>>>>>>>>>>>>>>>>>>>>> actually be fixed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Will Parkinson, one of the subscribers of this
>>>>>>>>>>>>>>>>>>>>>>>>>>> list, may find the symptoms meaningful, since he set up an AWS SharePoint
>>>>>>>>>>>>>>>>>>>>>>>>>>> instance before.  I hope he will respond in a helpful way.  Until then, I
>>>>>>>>>>>>>>>>>>>>>>>>>>> think we are stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 9:49 AM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like I'll be able to get access for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> you to the test system we're using. Would you be interested in working with
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the system directly? I certainly don't mind doing some testing but I
>>>>>>>>>>>>>>>>>>>>>>>>>>>> thought we'd speed things up this way. If so, could you email me from a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> more private account so we can set this up?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 7:38 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another interesting bit from the log:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list: '/_catalogs/lt/Forms/AllItems.aspx', 'List
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Template Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/_catalogs/masterpage/Forms/AllItems.aspx', 'Master Page Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list: '/Shared Documents/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list: '/SiteAssets/Forms/AllItems.aspx', 'Site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Assets'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list: '/SitePages/Forms/AllPages.aspx', 'Site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pages'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/_catalogs/solutions/Forms/AllItems.aspx', 'Solution Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list: '/Style Library/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Style Library'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list: '/Test Library 1/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Test Library 1'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list: '/_catalogs/theme/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Theme Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library list: '/_catalogs/wp/Forms/AllItems.aspx', 'Web
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Part Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Documents' exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets' exactly matched rule
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages' exactly matched rule
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Library' exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '7') - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This time it appears that it is the Lists
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> service that is broken and does not recognize the parent site.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I haven't corrected this problem yet since now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am beginning to wonder if *any* of the web services under Amazon work at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all for subsites.  We may be better off implementing everything we need in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the MCPermissions service.  I will ponder this as I continue to research
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the logs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's still valuable to check my getSites()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation.  I'll be doing another round of work tonight on the plugin.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 8:45 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The augmented plugin can be downloaded from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://people.apache.org/~kwright/MetaCarta.SharePoint.MCPermissionsService.wsp.  The revised connector code is also ready, and should be checked out and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> built from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-772.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once you set it all up, you can see if it is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doing the right thing by just trying to drill down through subsites in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI.  You should always see a list of subsites that is appropriate for the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> context you are in; if this does not happen it is not working.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:45 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can see how preloading the list of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subsites may be less optimal.. The advantage of doing it this way is one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call and you've got the structure in memory, which may be OK unless there
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are sites with a ton of subsites which may stress out memory. The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disadvantage is having to throw this structure around..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'll certainly help test out your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changes, just let me know when they're available.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:19 PM, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the code snippet.  I'd prefer,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> though, to not preload the entire site structure in memory.  Probably it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be better to just add another method to the ManifoldCF SharePoint
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2010 plugin.  More methods are going to be added anyway to support Claim
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Space Authentication, so I guess this would be just one more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We honestly have never seen this problem
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before - so it's not just flakiness, it has something to do with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> installation, I'm certain.  At any rate, I'll get going right away on a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround - if you are willing to test what I produce.  I'm also certain
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there is at least one other issue, but hopefully that will become clearer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> once this one is resolved.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 6:49 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> subsite discovery is effectively
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled except directly under the root site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes. Come to think of it, I once came
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> across this problem while implementing a SharePoint connector.  I'm not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sure whether it's exactly what's happening with the issue we're discussing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but looks like it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I started off by using multiple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getWebCollection calls to get child subsites of sites and trying to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> navigate down that way. The problem was that getWebCollection was always
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> returning the immediate subsites of the root site no matter whether you're
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the root or below, so I ended up generating infinite loops.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I switched over to using a single
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getAllSubWebCollection call and caching its results. That call returns the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> full list of all subsites as pairs of Title and Url.  I had a POJO similar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the one below which held the list of sites and contained logic for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enumerating the child sites, given the URL of a (parent) site.  From what I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recall, getWebCollection works inconsistently, either across SP versions or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> across installations, but the logic below should work in any case.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *** public class SubSiteCollection --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds a list of CrawledSite pojo's each of which is a { title, url }.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *** SubSiteCollection has the following:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  public List<CrawledSite>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getImmediateSubSites(String siteUrl) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   List<CrawledSite> subSites = new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ArrayList<CrawledSite>();
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   for (CrawledSite site : sites) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    if (isChildOf(siteUrl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> site.getUrl().toString())) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     subSites.add(site);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return subSites;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  private static boolean isChildOf(String
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parentUrl, String urlToCheck) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   final String parent =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normalizeUrl(parentUrl);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   final String child =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normalizeUrl(urlToCheck);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   boolean ret = false;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   if (child.startsWith(parent)) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    String remainder =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> child.substring(parent.length());
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    ret =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StringUtils.countOccurrencesOf(remainder, SLASH) == 1;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return ret;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  private static String normalizeUrl(String
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return ((url.endsWith(SLASH)) ? url :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url + SLASH).toLowerCase();
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:54 PM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Have a look at this sequence also:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Abcd',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Defghij',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Klmnopqr',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd' exactly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Including site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '8') - SharePoint: Including site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is using the GetSites(String parent)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method with a site name of "/Klmnopqr/Abcd/Abcd/Klmnopqr", and getting back
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> three sites (!!).  The parent path is not correct, obviously, but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nevertheless this one way in which paths are getting completely messed up.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It *looks* like the Webs web service is broken in such a way as to ignore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the URL coming in, except for the base part, which means that subsite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discovery is effectively disabled except directly under the root site.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This might still be OK if it is not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to create subsites of subsites in this version of SharePoint.  Can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you confirm that this is or is not possible?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:42 PM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "This is everything that got generated,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the very beginning"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Well, something isn't right.  What I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect to see that I don't right up front are:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A webs "getWebCollection" invocation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for /_vti_bin/webs.asmx
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Two lists "getListCollection"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> invocations for /_vti_bin/lists.asmx
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Instead the first transactions I see are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from already busted URLs - which make no sense since there would be no way
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> they should have been able to get queued yet.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So there are a number of possibilities.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> First, maybe the log isn't getting cleared out, and the session in question
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> therefore starts somewhere in the middle of manifoldcf.log.1.  But no:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> C:\logs>grep "POST /_vti_bin/webs"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grep: input lines truncated - result
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questionable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nevertheless there are some interesting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> points here.  First, note the following response, which I've been able to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> determine is against "Test Library 1":
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlns=""><GetListItemsResult
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FileRef="SitePages/Home.aspx"/></GetListItemsResponse></GetListItems>'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - SharePoint: Checking whether to include document
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - SharePoint: File '/SitePages/Home.aspx' exactly matched rule
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - SharePoint: Including file '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  WARN 2013-09-16 13:02:31,590 (Worker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread '23') - Sharepoint: Unexpected relPath structure; path is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/SitePages/Home.aspx', but expected <list/library> length of 26
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The FileRef in this case is pointing at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what, exactly?  Is there a SitePages/Home.aspx in the "Test Library 1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> library?  Or does it mean to refer back to the root site with this URL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construction?  And since this is supposedly at the root level, how come the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> combined site + library name comes out to 26??  I get 15, which leaves 11
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> characters unaccounted for.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still looking at the logs to see if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can glean key information.  Later, if I could set up a crawl against the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sharepoint instance in question, that would certainly help.  I can readily
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set up an ssh tunnel if that is what is required.  But I won't be able to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do it until I get home tonight.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:58 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is everything that got generated,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the very beginning, meaning that I did a fresh build, new database,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new connection definitions, start. The log must have rolled but the .1 log
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is included.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If I were to get you access to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actual test system, would you mind taking a look? It may be more efficient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than sending logs..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:48 PM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These logs are different but have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly the same problem; they start in the middle when the crawl is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already well underway.  I'm wondering if by chance you have more than one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agents process running or something?  Or maybe the log is rolling and stuff
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is getting lost?  What's there is not what I would expect to see, at all.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I *did* manage to find two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transactions that look like they might be helpful, but because the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *results* of those transactions are required by transactions that take
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> place minutes *before* in the log, I have no confidence that I'm looking at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anything meaningful.  But I'll get back to you on what I find nonetheless.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you decide repeat this exercise,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try watching the log with "tail -f" before starting the job.  You should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not see any log contents at all until the job is started.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:11 PM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Attached please find logs which start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the beginning. I started from a fresh build (clean db etc.), the logs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start at server start, then I create the output connection and the repo
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection, then the job, and then I fire off the job. I aborted the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution about a minute into it or so.  That's all that's in the logs with:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.manifoldcf.connectors=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log4j.logger.httpclient.wire.header=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log4j.logger.org.apache.commons.httpclient=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 12:39 PM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you sure these are the right
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logs?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They start right in the middle of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a crawl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They are already in a broken state
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when they start, e.g. the kinds of things that are being looked up are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already nonsense paths
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I need to see logs from the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> BEGINNING of a fresh crawl to see how the nonsense paths happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:52 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've generated logs with details as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we discussed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job was created afresh, as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The logs are attached.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:20 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Do you think that this issue is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> generic with regard to any Amz instance?"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I presume so, since you didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apparently do anything special to set one of these up.  Unfortunately, such
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instances are not part of the free tier, so I am still constrained from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setting one up for myself because of household rules here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "For now, I assume our only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround is to list the paths of interest manually"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Depending on what is going wrong,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that may not even work.  It looks like several SharePoint web service calls
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be affected, and not in a cleanly predictable way, for this to happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "is identification and extraction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of attachments supported in the SP connector?"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF in general leaves
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> identification and extraction to the search engine.  Solr, for instance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses Tika for this, if so configured.  You can configure your Solr output
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection to include or exclude specific mime types or extensions if you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want to limit what is attempted.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:09 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, Karl. Do you think that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this issue is generic with regard to any Amz instance? I'm just wondering
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how easily reproducible this may be..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For now, I assume our only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround is to list the paths of interest manually, i.e. add explicit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rules for each library and list.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related subject - is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> identification and extraction of attachments supported in the SP
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector?  E.g. if I have a Word doc attached to a Task list item, would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that be extracted?  So far, I see that library content gets crawled and I'm
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getting the list item data but am not sure what happens to the attachments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:48 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the additional
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information.  It does appear like the method that lists subsites is not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working as expected under AWS.  Nor are some number of other methods which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supposedly just list the children of a subsite.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've reopened CONNECTORS-772 to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work on addressing this issue.  Please stay tuned.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:08
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Most of the paths that get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> generated are listed in the attached log, they match what shows up in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> diag report. So I'm not sure where they diverge, most of them just don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seem right.  There are 3 subsites rooted in the main site: Abcd, Defghij,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Klmnopqr.  It's strange that the connector would try such paths as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /*Klmnopqr*/*Defghij*/*Defghij*/Announcements///
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- there are multiple repetitions of the same subsite on the path and to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> begin with, Defghij is not a subsite of Klmnopqr, so why would it try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this? the /// at the end doesn't seem correct either, unless I'm missing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something in how this pathing works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Test Library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1/Financia/lProjectionsTemplate.xl/Abcd/Announcements -- looks wrong. A
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> docname is mixed into the path, a subsite ends up after a docname?...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Shared
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Documents/Personal_Fina/ncial_Statement_1_1.xl/Defghij/ -- same types of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues plus now somehow the docname got split with a forward slash?..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There are also a bunch of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StringIndexOutOfBoundsException's.  Perhaps this logic doesn't fit with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathing we're seeing on this amz-based installation?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect the logic to just
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know that root contains 3 subsites, and work off that. Each subsite has a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific list of libraries and lists, etc. It seems odd that the connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gets into this matching pattern, and tries what looks like thousands of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> variations (I aborted the execution).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:56
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To clarify, the way you would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to analyze this is to run a crawl with the wildcards as you have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> selected, abort if necessary after a while, and then use the Document
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Status report to list the document identifiers that had been generated.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Find a document identifier that you believe represents a path that is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> illegal, and figure out what SOAP getChild call caused the problem by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> returning incorrect data.  In other words, find the point in the path where
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the path diverges from what exists into what doesn't exist, and go back in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the ManifoldCF logs to find the particular SOAP request that led to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect from your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> description that the problem lies with getting child sites given a site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> path, but that's just a guess at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 6:40
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't understand what you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mean by "I've tried the set of wildcards as below and I seem to be running
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into a lot of cycles, where various subsite folders are appended to each
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other and an extraction of data at all of those locations is attempted".
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are seeing cycles it means that document discovery is still failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in some way.  For each folder/library/site/subsite, only the children of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that folder/library/site/subsite should be appended to the path - ever.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you can give a specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example, preferably including the soap back-and-forth, that would be very
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 1:40
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Quick question. Is there an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy way to configure an SP repo connection for crawling of all content,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the root site all the way down?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've tried the set of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wildcards as below and I seem to be running into a lot of cycles, where
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> various subsite folders are appended to each other and an extraction of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data at all of those locations is attempted. Ideally I'd like to avoid
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> having to construct an exact set of paths because the set may change,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> especially with new content being added.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd also like to pull down
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any files attached to list items. I'm hoping that some type of "/* file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> include" should do it, once I figure out how to safely include all content.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message