manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Alfresco WebScript Connector - Testing Question
Date Wed, 28 Oct 2015 16:15:16 GMT
Hi Deanna,

For the CMIS connector, I created CONNECTORS-1248 to cover the version info
issue you describe.

Karl


On Wed, Oct 28, 2015 at 8:08 AM, Delapasse, Deanna <
ddelapasse@oceaneering.com> wrote:

> Hi Paul,
>
> I haven't read the entire thread, so I apologize if this is way off base...
>
> When I worked with the CMIS connector I had to modify the logic to append
> document.getLastModificationDate().getTimeInMillis() to the versionString
> for it to pick up changes.  The Alfresco document version won't update when
> you modify metadata.  My memory is terrible, but I believe that even
> modifying content may not do it unless you have the proper 'versioning'
> aspect applied.
>
> Check inside Alfresco and see if your "version" is actually incrementing
> as you expect. I was using an older Alfresco version and was not able to
> run with the Alfresco connector, but the CMIS connector worked great for us!
>
> Good luck!
> Deanna
>
>
>
>
> On Wed, Oct 28, 2015 at 6:07 AM, Paul Farrell <pfarrell@funnelback.com>
> wrote:
>
>> The alfresco log snippet doesn’t really shed any more light. It simple
>> doesn’t think that the document content has changed.
>>
>> 09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-5] [getNodesByTransactionId] On Store
>> workspace://SpacesStore
>> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-5] [getLastTransactionID]
>> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-5] [getNodesByAclChangesetId] On Store
>> workspace://SpacesStore
>> 09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-5] [getLastAclChangeSetID]
>> 09:56:42,070 DEBUG
>> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
>> [http-apr-8080-exec-5] Attaching 0 nodes to the WebScript template
>> 09:56:42,079 DEBUG
>> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
>> [http-apr-8080-exec-9] Invoking Changes Webscript, using the following
>> params
>> lastTxnId: 352
>> lastAclChangesetId: 13
>> storeId: SpacesStore
>> storeProtocol: workspace
>> indexingFilters:
>> {"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]}
>>
>> 09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-9] [getNodesByTransactionId] On Store
>> workspace://SpacesStore
>> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-9] [getLastTransactionID]
>> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-9] [getNodesByAclChangesetId] On Store
>> workspace://SpacesStore
>> 09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-9] [getLastAclChangeSetID]
>> 09:56:42,087 DEBUG
>> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
>> [http-apr-8080-exec-9] Attaching 0 nodes to the WebScript template
>>
>> *Paul Farrell*
>> Senior Search Consultant
>>
>> 109-123 Clifton Street, London EC2A 4LD
>> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>>
>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>>
>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
>> Twitter <https://twitter.com/funnelback>
>>
>> Funnelback UK Ltd is a limited liability company registered in England &
>> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
>> EC2A 4LD. Company registration number: 07004264.
>>
>> On 28 Oct 2015, at 10:50, Rafa Haro <rharoapache@gmail.com> wrote:
>>
>> You’re welcome Paul. Just in case, could you check the Alfresco logs to
>> see if there is something informative there?
>>
>> Cheers,
>> Rafa
>>
>>
>>
>>
>> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <pfarrell@funnelback.com>
>> wrote:
>>
>>> I see. That makes sense.
>>>
>>> No problem. Thanks for the feedback Rafa. Much appreciated.
>>>
>>>
>>>
>>> *Paul Farrell*
>>> Senior Search Consultant
>>>
>>> 109-123 Clifton Street, London EC2A 4LD
>>> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>>>
>>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>>>
>>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback>
-
>>>  Twitter <https://twitter.com/funnelback>
>>>
>>> Funnelback UK Ltd is a limited liability company registered in England &
>>> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
>>> EC2A 4LD. Company registration number: 07004264.
>>>
>>> On 28 Oct 2015, at 10:45, Rafa Haro <rharoapache@gmail.com> wrote:
>>>
>>> Hi Paul,
>>>
>>> Before contributing the Alfresco connector, we performed several tests
>>> similar to yours using an Alfresco 4.x version. Therefore, initially, my
>>> guess is the Webscript is not behaving correctly for Alfresco 5 instances.
>>> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the
>>> email thread. He might can provide some feedback about this or just confirm
>>> my suspicions.
>>>
>>> Cheers,
>>> Rafa
>>>
>>>
>>>
>>>
>>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pfarrell@funnelback.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In follow up to my recent email (below) I thought I would share my
>>>> findings with the ‘Alfresco Indexer’ connector (
>>>> https://github.com/maoo/alfresco-indexer) in case someone may be able
>>>> to advise on it’s usage.
>>>>
>>>> The reason I went to this is due to the lack of change control
>>>> detection with either of the packaged Manifold Alfresco connectors (AtomPub
>>>> or WebService). I needed a method whereby the crawl runs each night and
>>>> picks up any and all changes to the documents from the previous 24 hours.
A
>>>> common scenario.
>>>>
>>>> Unfortunately, I am still to achieve this.
>>>>
>>>> Having built and installed both the AMP and JAR files needed for the
>>>> new connector, changes are still not coming through. In fact, I have two
>>>> observations so far:
>>>>
>>>> 1. Changes to document content or properties does not cause the same
>>>> document to be picked up by the Alfresco connector on the next run
>>>> 2. Adding ‘Filter Configuration’ seems to do very little to change what
>>>> is picked up
>>>>
>>>> *IN DETAIL*
>>>> *1. Failing to pick up modified content*
>>>>
>>>> Looking at the log files (which are set to debug) I can see that, upon
>>>> the first crawl of Alfresco, Manifold sends the following requests:
>>>>
>>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request
>>>> GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239
>>>> >> GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239
>>>> >> "GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1[\r][\n]"
>>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request
>>>> GET
>>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240
>>>> >> GET
>>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240
>>>> >> "GET
>>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1[\r][\n]"
>>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request
>>>> GET
>>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241
>>>> >> GET
>>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241
>>>> >> "GET
>>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>>> HTTP/1.1[\r][\n]"
>>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request
>>>> GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242
>>>> >> GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242
>>>> >> "GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>>> HTTP/1.1[\r][\n]"
>>>>
>>>> This picks up all of the content e.g. documents.
>>>>
>>>> Running a second crawl, without any other actions being done, results
>>>> in the following requests:
>>>>
>>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >>
>>>> GET
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >>
>>>> "GET
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>> HTTP/1.1[\r][\n]”
>>>>
>>>> So I can see that, in the first instance, we are targeting content
>>>> directly while, in the second, we are asking for changes. The problem is
>>>> that no changes are returned from the second set of requests. The response
>>>> from these calls is:
>>>>
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
"
>>>>  "totalNodes" : "0", [\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
"
>>>>  "elapsedTime" : "8",[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
"
>>>>  "docs" : [[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
"
>>>>  ],[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
"
>>>>    "last_txn_id" : "352",[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
"
>>>>    "last_acl_changeset_id" : "13",[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
"
>>>>  "store_id" : "SpacesStore",[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
"
>>>>  "store_protocol" : "workspace"[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
>>>> “}"
>>>>
>>>> Regardless of what changes I make to a document that I have been using
>>>> for testing, the document is not updated. The response from the calls for
>>>> changes (totalNodes) is always ‘0’.
>>>>
>>>>
>>>> *2. Adding ‘Filter Configuration’ seems to do very little to change
>>>> what is picked up*
>>>>
>>>> Within my test Alfresco environment I have one site set up (Finance).
>>>> Within the Finance doc library I have three test docs. No other changes
>>>> have been made to the Alfresco instance.
>>>> Running a crawl with no filter configurations set returns 81 items.
>>>> This is via the URL in a browser.
>>>> If I then set the Site Filter configuration to ‘Finance’ and apply, I
>>>> still get 81 items when I re-run the crawl.
>>>> I can see that the term ‘Finance’ is being added to the URL but this
>>>> does not seem to change the behaviour.
>>>>
>>>>
>>>> I am happy to spend time diagnosing this is there is anyone available
>>>> to assist.
>>>>
>>>> Thanks
>>>>
>>>> Paul
>>>>
>>>>
>>>>
>>>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com wrote:
>>>>
>>>> Hi all,
>>>>
>>>> This is a question regarding the relatively new Alfresco Webscript
>>>> connector.
>>>>
>>>> SETUP
>>>> I have a vanilla Alfresco Community 5.0 installation
>>>> One site has been created called 'Finance'
>>>> A handful of documents have been created in 'Finance' Doc Library.
>>>> I have cloned and packaged up the 'alfresco-indexer' (
>>>> https://github.com/maoo/alfresco-indexer) and have applied the AMP and
>>>> CLIENT packages to their respective environments.
>>>>
>>>>
>>>> ISSUE
>>>> The issue is that the default API call used by Manifold is returning
>>>> nothing. The full API call used by Manifold, and based on my config, is :
>>>>
>>>>
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>>
>>>>
>>>> TESTS
>>>> I have identified two streamlined URL's. The first one returns the
>>>> documents that exist in the doc library of the 'Finance' site. This URL is:
>>>>
>>>>
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>>>
>>>> The second URL simply adds the site restriction. This URL returns
>>>> nothing:
>>>>
>>>>
>>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
>>>>
>>>>
>>>>
>>>> Can anyone explain why the documents do not return when only the
>>>> containing site is named in the API URL?
>>>>
>>>> Cheers
>>>>
>>>> Paul
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Mime
View raw message