manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Parkinson <parkinson.w...@gmail.com>
Subject Re: Getting a 401 Unauthorized on a SharePoint 2010 crawl request, with MCPermissions.asmx installed
Date Wed, 18 Sep 2013 13:28:01 GMT
Yes that's right, only really interested in the site that you are trying to
crawl


On Wed, Sep 18, 2013 at 11:25 PM, Dmitry Goldenberg
<dgoldenberg@kmwllc.com>wrote:

> Will,
>
> For SharePoint - 80, the output is
>
> NTAuthenticationProviders       : (STRING) "NTLM"
>
> I assume we're not interested in the Default Web Site; for that, the
> output is simply "The parameter NTAuthenticationProviders is not set at
> this node."
>
> - Dmitry
>
>
> On Wed, Sep 18, 2013 at 9:16 AM, Will Parkinson <parkinson.will@gmail.com>wrote:
>
>> If you open IIS manager and click on sites, it is displayed in the ID
>> column (see screenshot attached)
>>
>>
>> On Wed, Sep 18, 2013 at 10:55 PM, Dmitry Goldenberg <
>> dgoldenberg@kmwllc.com> wrote:
>>
>>> **Hi Will,
>>> Sorry, what is the "sharepoint website *number*" in that invokation?
>>> - Dmitry
>>>
>>>
>>> On Wed, Sep 18, 2013 at 8:53 AM, Will Parkinson <
>>> parkinson.will@gmail.com> wrote:
>>>
>>>> Hi Dmitry
>>>>
>>>> Just out of interest, what does the following command output on your
>>>> system
>>>>
>>>> cd to C:\inetpub\adminscripts
>>>>
>>>> *cscript adsutil.vbs get w3svc/<put your sharepoint website number
>>>> here>/root/NTAuthenticationProviders*
>>>>
>>>> Cheers,
>>>>
>>>> Will
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 10:44 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>
>>>>> "This is the second time I'm encountering the issue which leads me to
>>>>> believe it's a quirk of IIS and/or SharePoint."
>>>>>
>>>>> It cannot be just a quirk of SharePoint because SharePoint's UI etc
>>>>> could not create or work with subsites if that was true.  It may well be a
>>>>> configuration issue with IIS, which is indeed what I suspect.  I have
>>>>> pinged all the resources I know of to try and get some insight as to why
>>>>> this is happening.
>>>>>
>>>>>
>>>>> "Perhaps this is something that can be worked into the 'fabric' of
>>>>> ManifoldCF as a workaround for a known issue."
>>>>>
>>>>> Like I said before, this is a huge amount of work, tantamount to
>>>>> rewriting most of the connector.  If this is what you want to request, that
>>>>> is your option, but there is no way we'd complete any of this work before
>>>>> December/January at the earliest.
>>>>>
>>>>>
>>>>> "Just to understand this a bit better, the main breakage here is that
>>>>> the wildcards don't work properly, right? "
>>>>>
>>>>> No, it means that ManifoldCF cannot get at any data of any kind
>>>>> associated with a SharePoint subsite.  Accessing root data works fine.  If
>>>>> you try to crawl as things are now, you must disable all subsites and just
>>>>> crawl the root site, or you will crawl the same things with longer and
>>>>> longer paths indefinitely.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 18, 2013 at 8:38 AM, Dmitry Goldenberg <
>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>
>>>>>> Karl,
>>>>>>
>>>>>> This is the second time I'm encountering the issue which leads me to
>>>>>> believe it's a quirk of IIS and/or SharePoint. Perhaps this is something
>>>>>> that can be worked into the 'fabric' of ManifoldCF as a workaround for a
>>>>>> known issue. I understand that it may have far reaching tenticles but I
>>>>>> wonder if that's really the only option...
>>>>>>
>>>>>> Just to understand this a bit better, the main breakage here is that
>>>>>> the wildcards don't work properly, right?  In theory if I have a repo
>>>>>> connector config which lists specific library and list paths, things should
>>>>>> work?  It's only when the /* types of wildcards are included, we're in
>>>>>> trouble?
>>>>>>
>>>>>> - Dmitry
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 18, 2013 at 8:07 AM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>
>>>>>>> Hi Dmitry,
>>>>>>>
>>>>>>> Someone else was having a similar problem. See
>>>>>>> http://social.technet.microsoft.com/Forums/sharepoint/en-US/e4b53c63-b89a-4356-a7b0-6ca7bfd22826/getting-sharepoint-subsite-from-custom-webservice.
>>>>>>>
>>>>>>> Apparently it does depend on how you get to the web service, which
>>>>>>> does argue that it is an IIS issue.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 17, 2013 at 5:44 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi Dmitry,
>>>>>>>>
>>>>>>>> As discussed privately I had a look at your system.  What is
>>>>>>>> happening is that the C# static SPContext.Current.Web is not reflecting the
>>>>>>>> subsite in any url that contains a subsite.  In other words, the URL coming
>>>>>>>> in might be "http://servername/subsite1/_vti_bin/MCPermissions.asmx",
>>>>>>>> but the MCPermissions.asmx plugin will think it is being executed in the
>>>>>>>> root context ("http://servername").  That's pretty broken
>>>>>>>> behavior, so I'm guessing that the problem is that either IIS or SharePoint
>>>>>>>> is somehow misconfigured to do this, and the web services would then begin
>>>>>>>> to work right again.  But I have no idea how this should actually be fixed.
>>>>>>>>
>>>>>>>> Will Parkinson, one of the subscribers of this list, may find the
>>>>>>>> symptoms meaningful, since he set up an AWS SharePoint instance before.  I
>>>>>>>> hope he will respond in a helpful way.  Until then, I think we are stuck.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 17, 2013 at 9:49 AM, Dmitry Goldenberg <
>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Karl,
>>>>>>>>>
>>>>>>>>> It looks like I'll be able to get access for you to the test
>>>>>>>>> system we're using. Would you be interested in working with the system
>>>>>>>>> directly? I certainly don't mind doing some testing but I thought we'd
>>>>>>>>> speed things up this way. If so, could you email me from a more private
>>>>>>>>> account so we can set this up?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> - Dmitry
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 17, 2013 at 7:38 AM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>
>>>>>>>>>> Another interesting bit from the log:
>>>>>>>>>>
>>>>>>>>>> >>>>>>
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/_catalogs/lt/Forms/AllItems.aspx', 'List Template Gallery'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/_catalogs/masterpage/Forms/AllItems.aspx', 'Master Page
>>>>>>>>>> Gallery'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/Shared Documents/Forms/AllItems.aspx', 'Shared Documents'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/SiteAssets/Forms/AllItems.aspx', 'Site Assets'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/SitePages/Forms/AllPages.aspx', 'Site Pages'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/_catalogs/solutions/Forms/AllItems.aspx', 'Solution Gallery'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/Style Library/Forms/AllItems.aspx', 'Style Library'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/Test Library 1/Forms/AllItems.aspx', 'Test Library 1'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/_catalogs/theme/Forms/AllItems.aspx', 'Theme Gallery'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library list: '/_catalogs/wp/Forms/AllItems.aspx', 'Web Part Gallery'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Checking whether to include library
>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents' exactly
>>>>>>>>>> matched rule path '/*'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Including library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Checking whether to include library
>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets' exactly
>>>>>>>>>> matched rule path '/*'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Including library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Checking whether to include library
>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages' exactly matched
>>>>>>>>>> rule path '/*'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Including library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Checking whether to include library
>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library' exactly
>>>>>>>>>> matched rule path '/*'
>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') - SharePoint:
>>>>>>>>>> Including library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>> <<<<<<
>>>>>>>>>>
>>>>>>>>>> This time it appears that it is the Lists service that is broken
>>>>>>>>>> and does not recognize the parent site.
>>>>>>>>>>
>>>>>>>>>> I haven't corrected this problem yet since now I am beginning to
>>>>>>>>>> wonder if *any* of the web services under Amazon work at all for subsites.
>>>>>>>>>> We may be better off implementing everything we need in the MCPermissions
>>>>>>>>>> service.  I will ponder this as I continue to research the logs.
>>>>>>>>>>
>>>>>>>>>> It's still valuable to check my getSites() implementation.  I'll
>>>>>>>>>> be doing another round of work tonight on the plugin.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 16, 2013 at 8:45 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> The augmented plugin can be downloaded from
>>>>>>>>>>> http://people.apache.org/~kwright/MetaCarta.SharePoint.MCPermissionsService.wsp.  The revised connector code is also ready, and should be checked out and
>>>>>>>>>>> built from
>>>>>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-772.
>>>>>>>>>>>
>>>>>>>>>>> Once you set it all up, you can see if it is doing the right
>>>>>>>>>>> thing by just trying to drill down through subsites in the UI.  You should
>>>>>>>>>>> always see a list of subsites that is appropriate for the context you are
>>>>>>>>>>> in; if this does not happen it is not working.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:45 PM, Dmitry Goldenberg <
>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Karl,
>>>>>>>>>>>>
>>>>>>>>>>>> I can see how preloading the list of subsites may be less
>>>>>>>>>>>> optimal.. The advantage of doing it this way is one call and you've got the
>>>>>>>>>>>> structure in memory, which may be OK unless there are sites with a ton of
>>>>>>>>>>>> subsites which may stress out memory. The disadvantage is having to throw
>>>>>>>>>>>> this structure around..
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I'll certainly help test out your changes, just let me
>>>>>>>>>>>> know when they're available.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:19 PM, Karl Wright <
>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the code snippet.  I'd prefer, though, to not
>>>>>>>>>>>>> preload the entire site structure in memory.  Probably it would be better
>>>>>>>>>>>>> to just add another method to the ManifoldCF SharePoint 2010 plugin.  More
>>>>>>>>>>>>> methods are going to be added anyway to support Claim Space Authentication,
>>>>>>>>>>>>> so I guess this would be just one more.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We honestly have never seen this problem before - so it's not
>>>>>>>>>>>>> just flakiness, it has something to do with the installation, I'm certain.
>>>>>>>>>>>>> At any rate, I'll get going right away on a workaround - if you are willing
>>>>>>>>>>>>> to test what I produce.  I'm also certain there is at least one other
>>>>>>>>>>>>> issue, but hopefully that will become clearer once this one is resolved.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 6:49 PM, Dmitry Goldenberg <
>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> >> subsite discovery is effectively disabled except directly
>>>>>>>>>>>>>> under the root site
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes. Come to think of it, I once came across this problem
>>>>>>>>>>>>>> while implementing a SharePoint connector.  I'm not sure whether it's
>>>>>>>>>>>>>> exactly what's happening with the issue we're discussing but looks like it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I started off by using multiple getWebCollection calls to get
>>>>>>>>>>>>>> child subsites of sites and trying to navigate down that way. The problem
>>>>>>>>>>>>>> was that getWebCollection was always returning the immediate subsites of
>>>>>>>>>>>>>> the root site no matter whether you're at the root or below, so I ended up
>>>>>>>>>>>>>> generating infinite loops.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I switched over to using a single getAllSubWebCollection call
>>>>>>>>>>>>>> and caching its results. That call returns the full list of all subsites as
>>>>>>>>>>>>>> pairs of Title and Url.  I had a POJO similar to the one below which held
>>>>>>>>>>>>>> the list of sites and contained logic for enumerating the child sites,
>>>>>>>>>>>>>> given the URL of a (parent) site.  From what I recall, getWebCollection
>>>>>>>>>>>>>> works inconsistently, either across SP versions or across installations,
>>>>>>>>>>>>>> but the logic below should work in any case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *** public class SubSiteCollection -- holds a list of
>>>>>>>>>>>>>> CrawledSite pojo's each of which is a { title, url }.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *** SubSiteCollection has the following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  public List<CrawledSite> getImmediateSubSites(String
>>>>>>>>>>>>>> siteUrl) {
>>>>>>>>>>>>>>   List<CrawledSite> subSites = new ArrayList<CrawledSite>();
>>>>>>>>>>>>>>   for (CrawledSite site : sites) {
>>>>>>>>>>>>>>    if (isChildOf(siteUrl, site.getUrl().toString())) {
>>>>>>>>>>>>>>     subSites.add(site);
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>   return subSites;
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  private static boolean isChildOf(String parentUrl, String
>>>>>>>>>>>>>> urlToCheck) {
>>>>>>>>>>>>>>   final String parent = normalizeUrl(parentUrl);
>>>>>>>>>>>>>>   final String child = normalizeUrl(urlToCheck);
>>>>>>>>>>>>>>   boolean ret = false;
>>>>>>>>>>>>>>   if (child.startsWith(parent)) {
>>>>>>>>>>>>>>    String remainder = child.substring(parent.length());
>>>>>>>>>>>>>>    ret = StringUtils.countOccurrencesOf(remainder, SLASH) ==
>>>>>>>>>>>>>> 1;
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>   return ret;
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  private static String normalizeUrl(String url) {
>>>>>>>>>>>>>>   return ((url.endsWith(SLASH)) ? url : url +
>>>>>>>>>>>>>> SLASH).toLowerCase();
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:54 PM, Karl Wright <
>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Have a look at this sequence also:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Subsite list: '
>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Abcd', 'Abcd'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Subsite list: '
>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Defghij',
>>>>>>>>>>>>>>> 'Defghij'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Subsite list: '
>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Klmnopqr',
>>>>>>>>>>>>>>> 'Klmnopqr'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Checking whether to include site
>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd' exactly matched rule
>>>>>>>>>>>>>>> path '/*'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Checking whether to include site
>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij' exactly matched
>>>>>>>>>>>>>>> rule path '/*'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Checking whether to include site
>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr' exactly matched
>>>>>>>>>>>>>>> rule path '/*'
>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8') -
>>>>>>>>>>>>>>> SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is using the GetSites(String parent) method with a site
>>>>>>>>>>>>>>> name of "/Klmnopqr/Abcd/Abcd/Klmnopqr", and getting back three sites (!!).
>>>>>>>>>>>>>>> The parent path is not correct, obviously, but nevertheless this one way in
>>>>>>>>>>>>>>> which paths are getting completely messed up.  It *looks* like the Webs web
>>>>>>>>>>>>>>> service is broken in such a way as to ignore the URL coming in, except for
>>>>>>>>>>>>>>> the base part, which means that subsite discovery is effectively disabled
>>>>>>>>>>>>>>> except directly under the root site.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This might still be OK if it is not possible to create
>>>>>>>>>>>>>>> subsites of subsites in this version of SharePoint.  Can you confirm that
>>>>>>>>>>>>>>> this is or is not possible?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:42 PM, Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "This is everything that got generated, from the very
>>>>>>>>>>>>>>>> beginning"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well, something isn't right.  What I expect to see that I
>>>>>>>>>>>>>>>> don't right up front are:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - A webs "getWebCollection" invocation for
>>>>>>>>>>>>>>>> /_vti_bin/webs.asmx
>>>>>>>>>>>>>>>> - Two lists "getListCollection" invocations for
>>>>>>>>>>>>>>>> /_vti_bin/lists.asmx
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Instead the first transactions I see are from already
>>>>>>>>>>>>>>>> busted URLs - which make no sense since there would be no way they should
>>>>>>>>>>>>>>>> have been able to get queued yet.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So there are a number of possibilities.  First, maybe the
>>>>>>>>>>>>>>>> log isn't getting cleared out, and the session in question therefore starts
>>>>>>>>>>>>>>>> somewhere in the middle of manifoldcf.log.1.  But no:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> C:\logs>grep "POST /_vti_bin/webs" manifoldcf.log.1
>>>>>>>>>>>>>>>> grep: input lines truncated - result questionable
>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Nevertheless there are some interesting points here.
>>>>>>>>>>>>>>>> First, note the following response, which I've been able to determine is
>>>>>>>>>>>>>>>> against "Test Library 1":
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread '23') -
>>>>>>>>>>>>>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>>>>>>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>>>>>>>>>>> xmlns=""><GetListItemsResult
>>>>>>>>>>>>>>>> FileRef="SitePages/Home.aspx"/></GetListItemsResponse></GetListItems>'
>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread '23') -
>>>>>>>>>>>>>>>> SharePoint: Checking whether to include document '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread '23') -
>>>>>>>>>>>>>>>> SharePoint: File '/SitePages/Home.aspx' exactly matched rule path '/*'
>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread '23') -
>>>>>>>>>>>>>>>> SharePoint: Including file '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>  WARN 2013-09-16 13:02:31,590 (Worker thread '23') -
>>>>>>>>>>>>>>>> Sharepoint: Unexpected relPath structure; path is '/SitePages/Home.aspx',
>>>>>>>>>>>>>>>> but expected <list/library> length of 26
>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The FileRef in this case is pointing at what, exactly?  Is
>>>>>>>>>>>>>>>> there a SitePages/Home.aspx in the "Test Library 1" library?  Or does it
>>>>>>>>>>>>>>>> mean to refer back to the root site with this URL construction?  And since
>>>>>>>>>>>>>>>> this is supposedly at the root level, how come the combined site + library
>>>>>>>>>>>>>>>> name comes out to 26??  I get 15, which leaves 11 characters unaccounted
>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm still looking at the logs to see if I can glean key
>>>>>>>>>>>>>>>> information.  Later, if I could set up a crawl against the sharepoint
>>>>>>>>>>>>>>>> instance in question, that would certainly help.  I can readily set up an
>>>>>>>>>>>>>>>> ssh tunnel if that is what is required.  But I won't be able to do it until
>>>>>>>>>>>>>>>> I get home tonight.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:58 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is everything that got generated, from the very
>>>>>>>>>>>>>>>>> beginning, meaning that I did a fresh build, new database, new connection
>>>>>>>>>>>>>>>>> definitions, start. The log must have rolled but the .1 log is included.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If I were to get you access to the actual test system,
>>>>>>>>>>>>>>>>> would you mind taking a look? It may be more efficient than sending logs..
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:48 PM, Karl Wright <
>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> These logs are different but have exactly the same
>>>>>>>>>>>>>>>>>> problem; they start in the middle when the crawl is already well underway.
>>>>>>>>>>>>>>>>>> I'm wondering if by chance you have more than one agents process running or
>>>>>>>>>>>>>>>>>> something?  Or maybe the log is rolling and stuff is getting lost?  What's
>>>>>>>>>>>>>>>>>> there is not what I would expect to see, at all.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I *did* manage to find two transactions that look like
>>>>>>>>>>>>>>>>>> they might be helpful, but because the *results* of those transactions are
>>>>>>>>>>>>>>>>>> required by transactions that take place minutes *before* in the log, I
>>>>>>>>>>>>>>>>>> have no confidence that I'm looking at anything meaningful.  But I'll get
>>>>>>>>>>>>>>>>>> back to you on what I find nonetheless.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you decide repeat this exercise, try watching the log
>>>>>>>>>>>>>>>>>> with "tail -f" before starting the job.  You should not see any log
>>>>>>>>>>>>>>>>>> contents at all until the job is started.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:11 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Attached please find logs which start at the beginning.
>>>>>>>>>>>>>>>>>>> I started from a fresh build (clean db etc.), the logs start at server
>>>>>>>>>>>>>>>>>>> start, then I create the output connection and the repo connection, then
>>>>>>>>>>>>>>>>>>> the job, and then I fire off the job. I aborted the execution about a
>>>>>>>>>>>>>>>>>>> minute into it or so.  That's all that's in the logs with:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> org.apache.manifoldcf.connectors=DEBUG
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> log4j.logger.httpclient.wire.header=DEBUG
>>>>>>>>>>>>>>>>>>> log4j.logger.org.apache.commons.httpclient=DEBUG
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 12:39 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Are you sure these are the right logs?
>>>>>>>>>>>>>>>>>>>> - They start right in the middle of a crawl
>>>>>>>>>>>>>>>>>>>> - They are already in a broken state when they start,
>>>>>>>>>>>>>>>>>>>> e.g. the kinds of things that are being looked up are already nonsense paths
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I need to see logs from the BEGINNING of a fresh crawl
>>>>>>>>>>>>>>>>>>>> to see how the nonsense paths happen.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:52 AM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've generated logs with details as we discussed.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The job was created afresh, as before:
>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>> The logs are attached.
>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:20 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> "Do you think that this issue is generic with regard
>>>>>>>>>>>>>>>>>>>>>> to any Amz instance?"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I presume so, since you didn't apparently do anything
>>>>>>>>>>>>>>>>>>>>>> special to set one of these up.  Unfortunately, such instances are not part
>>>>>>>>>>>>>>>>>>>>>> of the free tier, so I am still constrained from setting one up for myself
>>>>>>>>>>>>>>>>>>>>>> because of household rules here.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> "For now, I assume our only workaround is to list the
>>>>>>>>>>>>>>>>>>>>>> paths of interest manually"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Depending on what is going wrong, that may not even
>>>>>>>>>>>>>>>>>>>>>> work.  It looks like several SharePoint web service calls may be affected,
>>>>>>>>>>>>>>>>>>>>>> and not in a cleanly predictable way, for this to happen.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> "is identification and extraction of attachments
>>>>>>>>>>>>>>>>>>>>>> supported in the SP connector?"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ManifoldCF in general leaves identification and
>>>>>>>>>>>>>>>>>>>>>> extraction to the search engine.  Solr, for instance uses Tika for this, if
>>>>>>>>>>>>>>>>>>>>>> so configured.  You can configure your Solr output connection to include or
>>>>>>>>>>>>>>>>>>>>>> exclude specific mime types or extensions if you want to limit what is
>>>>>>>>>>>>>>>>>>>>>> attempted.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:09 AM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks, Karl. Do you think that this issue is
>>>>>>>>>>>>>>>>>>>>>>> generic with regard to any Amz instance? I'm just wondering how easily
>>>>>>>>>>>>>>>>>>>>>>> reproducible this may be..
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> For now, I assume our only workaround is to list the
>>>>>>>>>>>>>>>>>>>>>>> paths of interest manually, i.e. add explicit rules for each library and
>>>>>>>>>>>>>>>>>>>>>>> list.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> A related subject - is identification and extraction
>>>>>>>>>>>>>>>>>>>>>>> of attachments supported in the SP connector?  E.g. if I have a Word doc
>>>>>>>>>>>>>>>>>>>>>>> attached to a Task list item, would that be extracted?  So far, I see that
>>>>>>>>>>>>>>>>>>>>>>> library content gets crawled and I'm getting the list item data but am not
>>>>>>>>>>>>>>>>>>>>>>> sure what happens to the attachments.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:48 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the additional information.  It does
>>>>>>>>>>>>>>>>>>>>>>>> appear like the method that lists subsites is not working as expected under
>>>>>>>>>>>>>>>>>>>>>>>> AWS.  Nor are some number of other methods which supposedly just list the
>>>>>>>>>>>>>>>>>>>>>>>> children of a subsite.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I've reopened CONNECTORS-772 to work on addressing
>>>>>>>>>>>>>>>>>>>>>>>> this issue.  Please stay tuned.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:08 AM, Dmitry Goldenberg
>>>>>>>>>>>>>>>>>>>>>>>> <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Most of the paths that get generated are listed in
>>>>>>>>>>>>>>>>>>>>>>>>> the attached log, they match what shows up in the diag report. So I'm not
>>>>>>>>>>>>>>>>>>>>>>>>> sure where they diverge, most of them just don't seem right.  There are 3
>>>>>>>>>>>>>>>>>>>>>>>>> subsites rooted in the main site: Abcd, Defghij, Klmnopqr.  It's strange
>>>>>>>>>>>>>>>>>>>>>>>>> that the connector would try such paths as:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> /*Klmnopqr*/*Defghij*/*Defghij*/Announcements///
>>>>>>>>>>>>>>>>>>>>>>>>> -- there are multiple repetitions of the same subsite on the path and to
>>>>>>>>>>>>>>>>>>>>>>>>> begin with, Defghij is not a subsite of Klmnopqr, so why would it try
>>>>>>>>>>>>>>>>>>>>>>>>> this? the /// at the end doesn't seem correct either, unless I'm missing
>>>>>>>>>>>>>>>>>>>>>>>>> something in how this pathing works.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> /Test Library
>>>>>>>>>>>>>>>>>>>>>>>>> 1/Financia/lProjectionsTemplate.xl/Abcd/Announcements -- looks wrong. A
>>>>>>>>>>>>>>>>>>>>>>>>> docname is mixed into the path, a subsite ends up after a docname?...
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> /Shared
>>>>>>>>>>>>>>>>>>>>>>>>> Documents/Personal_Fina/ncial_Statement_1_1.xl/Defghij/ -- same types of
>>>>>>>>>>>>>>>>>>>>>>>>> issues plus now somehow the docname got split with a forward slash?..
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> There are also a bunch of
>>>>>>>>>>>>>>>>>>>>>>>>> StringIndexOutOfBoundsException's.  Perhaps this logic doesn't fit with the
>>>>>>>>>>>>>>>>>>>>>>>>> pathing we're seeing on this amz-based installation?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect the logic to just know that root
>>>>>>>>>>>>>>>>>>>>>>>>> contains 3 subsites, and work off that. Each subsite has a specific list of
>>>>>>>>>>>>>>>>>>>>>>>>> libraries and lists, etc. It seems odd that the connector gets into this
>>>>>>>>>>>>>>>>>>>>>>>>> matching pattern, and tries what looks like thousands of variations (I
>>>>>>>>>>>>>>>>>>>>>>>>> aborted the execution).
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:56 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> To clarify, the way you would need to analyze
>>>>>>>>>>>>>>>>>>>>>>>>>> this is to run a crawl with the wildcards as you have selected, abort if
>>>>>>>>>>>>>>>>>>>>>>>>>> necessary after a while, and then use the Document Status report to list
>>>>>>>>>>>>>>>>>>>>>>>>>> the document identifiers that had been generated.  Find a document
>>>>>>>>>>>>>>>>>>>>>>>>>> identifier that you believe represents a path that is illegal, and figure
>>>>>>>>>>>>>>>>>>>>>>>>>> out what SOAP getChild call caused the problem by returning incorrect
>>>>>>>>>>>>>>>>>>>>>>>>>> data.  In other words, find the point in the path where the path diverges
>>>>>>>>>>>>>>>>>>>>>>>>>> from what exists into what doesn't exist, and go back in the ManifoldCF
>>>>>>>>>>>>>>>>>>>>>>>>>> logs to find the particular SOAP request that led to the issue.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect from your description that the problem
>>>>>>>>>>>>>>>>>>>>>>>>>> lies with getting child sites given a site path, but that's just a guess at
>>>>>>>>>>>>>>>>>>>>>>>>>> this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 6:40 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't understand what you mean by "I've tried
>>>>>>>>>>>>>>>>>>>>>>>>>>> the set of wildcards as below and I seem to be running into a lot of
>>>>>>>>>>>>>>>>>>>>>>>>>>> cycles, where various subsite folders are appended to each other and an
>>>>>>>>>>>>>>>>>>>>>>>>>>> extraction of data at all of those locations is attempted".   If you are
>>>>>>>>>>>>>>>>>>>>>>>>>>> seeing cycles it means that document discovery is still failing in some
>>>>>>>>>>>>>>>>>>>>>>>>>>> way.  For each folder/library/site/subsite, only the children of that
>>>>>>>>>>>>>>>>>>>>>>>>>>> folder/library/site/subsite should be appended to the path - ever.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> If you can give a specific example, preferably
>>>>>>>>>>>>>>>>>>>>>>>>>>> including the soap back-and-forth, that would be very helpful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 1:40 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Quick question. Is there an easy way to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> configure an SP repo connection for crawling of all content, from the root
>>>>>>>>>>>>>>>>>>>>>>>>>>>> site all the way down?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've tried the set of wildcards as below and I
>>>>>>>>>>>>>>>>>>>>>>>>>>>> seem to be running into a lot of cycles, where various subsite folders are
>>>>>>>>>>>>>>>>>>>>>>>>>>>> appended to each other and an extraction of data at all of those locations
>>>>>>>>>>>>>>>>>>>>>>>>>>>> is attempted. Ideally I'd like to avoid having to construct an exact set of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> paths because the set may change, especially with new content being added.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd also like to pull down any files attached
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to list items. I'm hoping that some type of "/* file include" should do it,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> once I figure out how to safely include all content.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message