manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Farrell <pfarr...@funnelback.com>
Subject Re: Alfresco WebScript Connector - Testing Question
Date Wed, 28 Oct 2015 10:46:54 GMT
I see. That makes sense. 

No problem. Thanks for the feedback Rafa. Much appreciated. 



Paul Farrell
Senior Search Consultant
 
109-123 Clifton Street, London EC2A 4LD
T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>

UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES

Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - Twitter <https://twitter.com/funnelback>

Funnelback UK Ltd is a limited liability company registered in England & Wales. Registered
address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. Company registration number:
07004264.

> On 28 Oct 2015, at 10:45, Rafa Haro <rharoapache@gmail.com> wrote:
> 
> Hi Paul, 
> 
> Before contributing the Alfresco connector, we performed several tests similar to yours
using an Alfresco 4.x version. Therefore, initially, my guess is the Webscript is not behaving
correctly for Alfresco 5 instances. I’m including Maurizio Pillitu (Alfresco Indexer main
developer) in the email thread. He might can provide some feedback about this or just confirm
my suspicions. 
> 
> Cheers,
> Rafa
> 
> 
> 
> 
> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pfarrell@funnelback.com <mailto:pfarrell@funnelback.com>>
wrote:
> 
> Hi all,
> 
> In follow up to my recent email (below) I thought I would share my findings with the
‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>)
in case someone may be able to advise on it’s usage. 
> 
> The reason I went to this is due to the lack of change control detection with either
of the packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby
the crawl runs each night and picks up any and all changes to the documents from the previous
24 hours. A common scenario.
> 
> Unfortunately, I am still to achieve this. 
> 
> Having built and installed both the AMP and JAR files needed for the new connector, changes
are still not coming through. In fact, I have two observations so far:
> 
> 1. Changes to document content or properties does not cause the same document to be picked
up by the Alfresco connector on the next run
> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked
up
> 
> IN DETAIL
> 1. Failing to pick up modified content
> 
> Looking at the log files (which are set to debug) I can see that, upon the first crawl
of Alfresco, Manifold sends the following requests:
> 
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1[\r][\n]"
> 
> This picks up all of the content e.g. documents. 
> 
> Running a second crawl, without any other actions being done, results in the following
requests:
> 
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1[\r][\n]”
> 
> So I can see that, in the first instance, we are targeting content directly while, in
the second, we are asking for changes. The problem is that no changes are returned from the
second set of requests. The response from these calls is:
> 
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "totalNodes"
: "0", [\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "elapsedTime"
: "8",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs"
: [[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  ],[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_txn_id"
: "352",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_acl_changeset_id"
: "13",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_id"
: "SpacesStore",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_protocol"
: "workspace"[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
> 
> Regardless of what changes I make to a document that I have been using for testing, the
document is not updated. The response from the calls for changes (totalNodes) is always ‘0’.
> 
> 
> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked
up
> 
> Within my test Alfresco environment I have one site set up (Finance). Within the Finance
doc library I have three test docs. No other changes have been made to the Alfresco instance.

> Running a crawl with no filter configurations set returns 81 items. This is via the URL
in a browser.
> If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81
items when I re-run the crawl. 
> I can see that the term ‘Finance’ is being added to the URL but this does not seem
to change the behaviour. 
> 
> 
> I am happy to spend time diagnosing this is there is anyone available to assist. 
> 
> Thanks
> 
> Paul
> 
> 
> 
>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com <mailto:pfarrell@funnelback.com>
wrote:
>> 
>> Hi all,
>> 
>> This is a question regarding the relatively new Alfresco Webscript connector. 
>> 
>> SETUP
>> I have a vanilla Alfresco Community 5.0 installation
>> One site has been created called 'Finance'
>> A handful of documents have been created in 'Finance' Doc Library.
>> I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer
<https://github.com/maoo/alfresco-indexer>) and have applied the AMP and CLIENT packages
to their respective environments. 
>> 
>> 
>> ISSUE
>> The issue is that the default API call used by Manifold is returning nothing. The
full API call used by Manifold, and based on my config, is :
>> 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>> 
>> 
>> TESTS
>> I have identified two streamlined URL's. The first one returns the documents that
exist in the doc library of the 'Finance' site. This URL is:
>> 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>> 
>> The second URL simply adds the site restriction. This URL returns nothing:
>> 
>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
<http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D>
>> 
>> 
>> 
>> Can anyone explain why the documents do not return when only the containing site
is named in the API URL?
>> 
>> Cheers
>> 
>> Paul
>> 
>> 
> 
> 


Mime
View raw message