manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Crawling SharePoint Lists
Date Wed, 16 Oct 2013 07:06:32 GMT
CONNECTORS-787.

Karl


On Wed, Oct 16, 2013 at 3:01 AM, Karl Wright <daddywri@gmail.com> wrote:

> Confirmed: the Items member is risky to use in large lists because there
> is no paging (so it can cause the SharePoint instance to run out of memory):
>
> "The Items property returns all the files in a document library,
> including files in subfolders, but not the folders themselves. In a
> document library, folders are not considered items.
>
> When you call the Items property, it returns an instance of an
> SPListItemCollection<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splistitemcollection%28v=office.14%29.aspx>object
that does not contain any data, but on first access to an item from
> the collection, the entire collection object is filled with data.
> Consequently, to improve performance it is recommended that you assign the
> items returned by Items to an SPListItemCollection<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splistitemcollection%28v=office.14%29.aspx>object
if you must iterate the entire collection, as seen in the example.
> It is best practice is to use one of the GetItem* methods of SPList<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splist%28v=office.14%29.aspx>to
return a filtered collection of items."
>
>
> So that's why we haven't been doing it that way.  We need the proper CAML
> expression which will allow full return of the discussion board contents.
>
> Nevertheless I'll open a ticket for this functionality; no idea how to
> complete it though.
>
> Karl
>
>
>
>
> On Wed, Oct 16, 2013 at 2:54 AM, Karl Wright <daddywri@gmail.com> wrote:
>
>> For discussion boards, then, the SharePoint C# API must not be working
>> properly, or we are using it incorrectly.  SharePoint API bugs are way
>> beyond my pay grade to fix.  If you think we are using it improperly, you
>> may have other resources than I have, which is basically just the web page
>> here:
>>
>>
>> http://msdn.microsoft.com/en-us/library/Microsoft.SharePoint.SPList.GetItems%28v=office.14%29.aspx
>>
>> ... and the one describing SPQuery objects here:
>>
>>
>> http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.spquery.query%28v=office.14%29.aspx
>>
>> Specifically I'm missing a description of the schema of Discussion
>> Boards, and how you'd construct a CAML query to get the missing rows.  All
>> of this stuff is pretty mysterious because if it is documented at all it is
>> documented in obscure places.  More full-time Microsoft coders seem to have
>> a similar problem, see:
>>
>>
>> http://social.msdn.microsoft.com/Forums/sharepoint/en-US/afe07483-6aec-424a-9434-c8e8b963e55c/how-to-get-all-the-items-from-a-discussion-board?forum=sharepointdevelopmentlegacy
>>
>> ... where they didn't figure out how to do it either, other than the
>> advice "don't do it that way" or just use the "Items" field, which I'm not
>> sure works in cases where the number of items in the list is large (I'll
>> look into this though).  Maybe you can experiment with the API directly
>> under SharePoint, and recommend C# code changes that will return the
>> missing rows, and if so I am happy to implement it and release it.
>>
>> Thanks,
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 8:53 PM, Mark Libucha <mlibucha@gmail.com> wrote:
>>
>>> Pretty sure. Screenshot attached.
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 3:42 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Are you sure you haven't deleted two of these rows?  Because the method
>>>> call on the server side is pretty generic:
>>>>
>>>> SPListItemCollection collListItems = oList.GetItems(listQuery);
>>>>
>>>> ... where listQuery is this:
>>>>
>>>>                     SPQuery listQuery = new SPQuery();
>>>>                         listQuery.Query = "<OrderBy
>>>> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>";
>>>>                         listQuery.QueryThrottleMode =
>>>> SPQueryThrottleOption.Override;
>>>>                         listQuery.ViewAttributes =
>>>> "Scope=\"Recursive\"";
>>>>                         listQuery.ViewFields = "<FieldRef
>>>> Name='FileRef' />";
>>>>                         listQuery.RowLimit = 1000;
>>>>
>>>> It's the same code that is used for all other lists as well, and those
>>>> do not suffer any lost rows - I tested that just now against Dmitry's
>>>> SharePoint instance.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 6:23 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> Thanks for the quick attention. It's better, but not fixed?
>>>>>
>>>>> I am now getting metadata for the one list row we were choking on
>>>>> before, but it doesn't see the other two rows at all. I think the relevant
>>>>> part of the log is this:
>>>>>
>>>>>
>>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>>
>>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>> xmlns=""><GetListItemsResult
>>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>>
>>>>> There should be a 1_.000 and a 2._000 as well.
>>>>>
>>>>> Maybe the problem is in the webapp on the SharePoint server?
>>>>>
>>>>> Thanks again for all the help.
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 15, 2013 at 2:31 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>
>>>>>> Just resolved this ticket, on trunk.
>>>>>>
>>>>>> Please synch up and try again.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 15, 2013 at 4:59 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>
>>>>>>> CONNECTORS-786.
>>>>>>>
>>>>>>> I've prioritized this as very high because this is functionality
>>>>>>> that used to work but is now broken because I added attachment
support.
>>>>>>> With luck I will be able to look at it later tonight.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>
>>>>>>>> This is the problem:
>>>>>>>>
>>>>>>>>
>>>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
>>>>>>>> 000' because modified date or attachment url not found
>>>>>>>>
>>>>>>>> It looks like it decided that the list item was in fact an
>>>>>>>> attachment, which makes sense because it was a compound list
id.
>>>>>>>>
>>>>>>>> I'll open a ticket for this.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Do you see any of these in the log?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There are 3 rows in my discussion group -- two topics,
one post in
>>>>>>>>> each, one with a reply. In the logs I'm only seeing one
of them (the
>>>>>>>>> chronologically last to be put into SharePoint).
>>>>>>>>>
>>>>>>>>> The log looks like this -- maybe that last message means
it's
>>>>>>>>> choking on this list and giving up on processing it further?
>>>>>>>>>
>>>>>>>>> Thanks...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>>>>>>
>>>>>>>>> SharePoint: getListItems xml response: '<GetListItems
xmlns="
>>>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>>>> xmlns=""><GetListItemsResult
>>>>>>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>>>>>>
>>>>>>>>> SharePoint: Checking whether to include list item
>>>>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>>>>
>>>>>>>>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>>>>>>>>
>>>>>>>>> SharePoint: Checking whether to include list item attachment
>>>>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>>>>
>>>>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000'
>>>>>>>>> because modified date or attachment url not found
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message