manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maurizio Pillitu <m...@apache.org>
Subject Re: Manifold/Alfresco seeding and security
Date Tue, 20 Oct 2015 15:50:56 GMT
Hi Paul,

it looks like you're hitting
https://github.com/maoo/alfresco-indexer/issues/3 ; which version of
alfresco-indexer are you using? Can you try using
http://search.maven.org/#artifactdetails%7Ccom.github.maoo.indexer%7Calfresco-indexer-webscripts%7C0.7.1%7Camp
(or
the pre-built WAR file -
http://search.maven.org/#artifactdetails%7Ccom.github.maoo.indexer%7Calfresco-indexer-webscripts-war%7C0.7.1%7Cwar
 )

HTH
  mao

On Tue, Oct 20, 2015 at 5:36 PM Paul Farrell <pfarrell@funnelback.com>
wrote:

> Hi,
>
> Having had to go back to basics and re-install my Alfresco instance, I can
> confirm that the AMP file for the alfresco indexer web scripts *does*
> actually install without error. There must have been an issue with my
> previous Alfresco instance.
>
> Having said that, the Alfresco WebScript connector fails. The failure is
> down to the ‘Context’ setting (see below):
>
>
> When you attempt to save the configuration of the WebScript connector,
> Manifold clearly tries to check the connection. It seems to do this by
> making an API call (/auth/resolve/admin). The issue is with what Manifold
> prepends to the start of that path.
> If I leave the setting as above then Manifold reports   :
>
> <tr><td>The Web Script <a
> href="%2Falfresco%2Fservice%2Fapi%2Fnode%2Fauth%2Fresolve%2Fadmin">/alfresco/service/api/node/auth/resolve/admin</a>
> has responded with a status of 404 - Not Found.</td></tr>[\n]”
>
> In other words, it builds the full path as
> “alfresco/service/api/node/auth/resolve/admin”.
>
> For my Alfresco Community 5.0 instance, I get to that same web script via
> the URL “/alfresco/service/auth/resolve/admin” i.e. without the ‘/api/node’.
>
> Somewhere, Manifold is assuming that the ‘/api/node’ is a correct path
> inclusion. In other words, there is nothing I can put into that box to
> prevent it.
>
> Paul
>
> On 20 Oct 2015, at 12:56, Karl Wright <daddywri@gmail.com> wrote:
>
> Hmm.  What file was missing?  Maurizio maintains the indexer plugin; I
> feel certain he'd want to know.
>
> Karl
>
>
> On Tue, Oct 20, 2015 at 7:53 AM, Paul Farrell <pfarrell@funnelback.com>
> wrote:
>
>> Hi guys,
>>
>> Just to let you know what’s going on - for informational purposes more
>> than anything.
>>
>> I initially tried taking the AMP file provided in the MCF plugins
>> directory (0.7.0) and tried to install it into Alfresco but got a message
>> saying a file was missing.
>>
>> Instead, I cloned the repository on GitHub for the alfresco-indexer
>> project and then built it on my local machine. This generated the AMP file
>> (0.7.2).
>>
>> I was able to successfully install the AMP file onto my Alfresco
>> instance.
>>
>> As it happens I now cannot log into Alfresco Share ('bad credentials or
>> server not available' message) but that is something I can work on.
>> Apparently the installation of some AMP files have been known to cause this
>> issue.
>>
>> So, progress to a point!
>>
>> *Paul Farrell*
>> Senior Search Consultant
>>
>> 109-123 Clifton Street, London EC2A 4LD
>> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>>
>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>>
>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
>> Twitter <https://twitter.com/funnelback>
>>
>> Funnelback UK Ltd is a limited liability company registered in England &
>> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
>> EC2A 4LD. Company registration number: 07004264.
>>
>> On 20 Oct 2015, at 12:36, Rafa Haro <rharoapache@gmail.com> wrote:
>>
>> Hi,
>>
>> At the Alfresco side, hope this helps:
>>
>> http://docs.alfresco.com/4.1/tasks/amp-install.html
>>
>> Cheers
>>
>>
>>
>>
>>
>> On Tue, Oct 20, 2015 at 1:13 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> The AMP file is actually shipped as part of the binary MCF
>>> distribution.  You can find it under "plugins".
>>>
>>> Karl
>>>
>>>
>>> On Tue, Oct 20, 2015 at 6:42 AM, Paul Farrell <pfarrell@funnelback.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Hopefully this will be my only request for information today.
>>>> I’m afraid this is a bit of a newbie question but I have managed to get
>>>> the Manifold UI to now show ‘Alfresco Webscripts’ as a connector. The
only
>>>> bit I am missing now is to install the AMP file in Afresco.
>>>>
>>>> I realise that this is slightly outside of the Manifold remit but I
>>>> wondered if anyone can advise how I build the AMP file from the URL (
>>>> https://github.com/maoo/alfresco-indexer)? I have cloned the
>>>> repository to my local drive but, having never worked with Maven, am at a
>>>> loss at how to generate the AMP file that I then need to install into
>>>> Alfresco.
>>>>
>>>> Many thanks,
>>>>
>>>> On 19 Oct 2015, at 17:36, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>> The only way you can have such a reduced list of connectors is if
>>>> somebody commented out many connectors in your connectors.xml, or removed
>>>> them from the database table where they are registered by hand.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Oct 19, 2015 at 12:33 PM, Paul Farrell <pfarrell@funnelback.com
>>>> > wrote:
>>>>
>>>>> After a good deal of time clicking around I came to the same
>>>>> conclusion - that there is no way of telling from the UI!!
>>>>>
>>>>> Having dug a bit deeper I believe I may actually have the Alfresco
>>>>> WebScript connectors installed. At least the 0.7.0 version. I notice
in the
>>>>> ‘lib’ directory that I have ‘alfresco-indexer-webscripts-0.7.0.amp.
>>>>>
>>>>> Looking in the ‘connectors.xml’ file I can also see the line :
>>>>>
>>>>> <repositoryconnector name="Alfresco Webscript"
>>>>> class="org.apache.manifoldcf.crawler.connectors.alfrescowebscript.AlfrescoConnector”/>
>>>>>
>>>>> You can imagine my excitement!
>>>>>
>>>>> The only thing I am missing is the option in the UI. When I click to
>>>>> create a new repo connection I get:  CMIS, Dropbox, Generic, GoogleDrive,
>>>>> HDFS, Jira, Meridio, RSS, Sharepoint.
>>>>>
>>>>> Perhaps I am hoping for too much to hope that I can make a simple
>>>>> change to enable this repo connection?
>>>>>
>>>>> Thanks for all the help everyone
>>>>>
>>>>>
>>>>>
>>>>> On 19 Oct 2015, at 17:26, Karl Wright <daddywri@gmail.com> wrote:
>>>>>
>>>>> Hah; there's not a way to inquire in the UI, if that's what you mean.
>>>>> But if you see "Alfresco webscript" in the list of repository connection
>>>>> types, you've got a version that supports that connector.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Mon, Oct 19, 2015 at 12:17 PM, Paul Farrell <
>>>>> pfarrell@funnelback.com> wrote:
>>>>>
>>>>>> Thanks Rafa.
>>>>>>
>>>>>> As an aside, is there an easy way to identify which version of
>>>>>> ManifoldCF you are on?
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> *Paul Farrell*
>>>>>> Senior Search Consultant
>>>>>>
>>>>>> 109-123 Clifton Street, London EC2A 4LD
>>>>>> *T* +44 (0) 207 183 6865 | funnelback.com
>>>>>> <http://www.funnelback.com/>
>>>>>>
>>>>>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>>>>>>
>>>>>> Connect with us: LinkedIn
>>>>>> <http://www.linkedin.com/company/funnelback> - Twitter
>>>>>> <https://twitter.com/funnelback>
>>>>>>
>>>>>> Funnelback UK Ltd is a limited liability company registered in
>>>>>> England & Wales. Registered address: Zetland House 109-123, Clifton
Street,
>>>>>> London. EC2A 4LD. Company registration number: 07004264.
>>>>>>
>>>>>> On 19 Oct 2015, at 16:54, Rafa Haro <rharo@apache.org> wrote:
>>>>>>
>>>>>> Hi Paul,
>>>>>>
>>>>>> All you need to do is to install this webscript
>>>>>> <https://github.com/maoo/alfresco-indexer> within your Alfresco
>>>>>> instance. The connector itself is already part of the most recent
versions
>>>>>> of ManifoldCF
>>>>>>
>>>>>> Cheers,
>>>>>> Rafa
>>>>>>
>>>>>> On Mon, Oct 19, 2015 at 5:29 PM, Paul Farrell <
>>>>>> pfarrell@funnelback.com> wrote:
>>>>>>
>>>>>>> Ok, thanks again guys.
>>>>>>>
>>>>>>> The Webscript connector it is.
>>>>>>>
>>>>>>> I realise I am asking a lot here but are there any easy-to-follow
>>>>>>> guidelines on how to get this Webscript connector installed?
 I see there
>>>>>>> is a GitHub page here (
>>>>>>> https://github.com/maoo/alfresco-webscript-manifold-connector)
>>>>>>> which discusses it (although it directs you to a repository of
files).
>>>>>>>
>>>>>>> I am just keen to make sure that any steps I follow to try and
get
>>>>>>> this Webscript connector installed and working are updated, reliable
steps.
>>>>>>> I would hate to waste time with out of date information.
>>>>>>>
>>>>>>> Thanks all
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 19 Oct 2015, at 16:23, Muhammed Olgun <mh.olgun@gmail.com>
wrote:
>>>>>>>
>>>>>>> Hi Paul,
>>>>>>>
>>>>>>> I suggest that you should use Alfresco Webscript as Karl mentioned.
>>>>>>> Web services is so slow compared to other services and I've also
checked
>>>>>>> that Alfresco CMIS web services does not return change token(may
be there
>>>>>>> is something that I don't know).
>>>>>>>
>>>>>>> By the way current version of CMIS connector is not aware of
change
>>>>>>> token. I would write a patch for you if alfresco supports change
token
>>>>>>> property.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Muhammed
>>>>>>> 19 Eki 2015 Pzt, saat 18:11 tarihinde Karl Wright <
>>>>>>> daddywri@gmail.com> şunu yazdı:
>>>>>>>
>>>>>>>> Hi Paul,
>>>>>>>>
>>>>>>>> The Alfresco Webscript connector is a wholly different connector
>>>>>>>> that has no relation to the CMIS connector.  It requires
an Alfresco
>>>>>>>> webscript plugin be installed on your Alfresco server to
work, though.
>>>>>>>>
>>>>>>>> Hope that helps.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 19, 2015 at 10:32 AM, Paul Farrell <
>>>>>>>> pfarrell@funnelback.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Muhammed/Karl,
>>>>>>>>>
>>>>>>>>> Firstly, thank-you so much for taking the time to reply.
It is
>>>>>>>>> very much appreciated.
>>>>>>>>>
>>>>>>>>> Currently I am using the AtomPub for my CMIS repository
>>>>>>>>> connection. I have just read something which may shed
a little light on
>>>>>>>>> this. The post read that change tokens are not passed
via AtomPub
>>>>>>>>> connections (
>>>>>>>>> https://forums.alfresco.com/forum/developer-discussions/alfresco-api/cmis-change-log-token-problem-using-opencmis-03282011-1758).
>>>>>>>>> If true, this would explain why ManifoldCF may be unable
to determine a
>>>>>>>>> change in Alfresco.
>>>>>>>>>
>>>>>>>>> It looks like I have two possible options left open to
me (correct
>>>>>>>>> me if I’m wrong):
>>>>>>>>>
>>>>>>>>> 1. I look to use ‘Web Services’ instead of ‘AtomPub’
for the
>>>>>>>>> connection mechanism
>>>>>>>>> 2. I upgrade ManifoldCF so that I can use the ‘Web
Scripts’
>>>>>>>>> connector?  (or is this the same as the ‘Web Services’
connection mentioned
>>>>>>>>> above?)
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>>
>>>>>>>>> Paul
>>>>>>>>>
>>>>>>>>> *Paul Farrell*
>>>>>>>>> Senior Search Consultant
>>>>>>>>>
>>>>>>>>> 109-123 Clifton Street, London EC2A 4LD
>>>>>>>>> *T* +44 (0) 207 183 6865 | funnelback.com
>>>>>>>>> <http://www.funnelback.com/>
>>>>>>>>>
>>>>>>>>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND |
UNITED
>>>>>>>>> STATES
>>>>>>>>>
>>>>>>>>> Connect with us: LinkedIn
>>>>>>>>> <http://www.linkedin.com/company/funnelback> -
Twitter
>>>>>>>>> <https://twitter.com/funnelback>
>>>>>>>>>
>>>>>>>>> Funnelback UK Ltd is a limited liability company registered
in
>>>>>>>>> England & Wales. Registered address: Zetland House
109-123, Clifton Street,
>>>>>>>>> London. EC2A 4LD. Company registration number: 07004264.
>>>>>>>>>
>>>>>>>>> On 19 Oct 2015, at 15:12, Muhammed Olgun <mh.olgun@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Paul,
>>>>>>>>>
>>>>>>>>> Repositories should give information to ManifoldCF when
they
>>>>>>>>> updated. Current CMIS connector reindex document if the
lastest version of
>>>>>>>>> the document has changed, not updated.
>>>>>>>>>
>>>>>>>>> There is a change token property in CMIS specification
and it
>>>>>>>>> should change when document is updated so ManifoldCF
can understand that
>>>>>>>>> document is updated but implementing change token property
is optional.
>>>>>>>>> I've checked Alfresco's CMIS web site and seen that they
didn't set the
>>>>>>>>> change token.
>>>>>>>>>
>>>>>>>>> I think, there is nothing we can do at this point.
>>>>>>>>>
>>>>>>>>> 19 Eki 2015 Pzt, 15:59 tarihinde, Karl Wright <daddywri@gmail.com>
>>>>>>>>> şunu yazdı:
>>>>>>>>>
>>>>>>>>>> Hi Paul,
>>>>>>>>>>
>>>>>>>>>> This looks like a bug in the CMIS connector to me;
usually the
>>>>>>>>>> document version string the connector constructs
should be adequate to
>>>>>>>>>> detect all changes.  Can you create a ticket?
>>>>>>>>>> https://issues.apache.org/jira , project ManifoldCF.
 Please
>>>>>>>>>> include what version of MCF you are using here. 
FWIW, this may be in fact
>>>>>>>>>> a bug in the Alfresco CMIS implementation, but we'll
have to have some back
>>>>>>>>>> and forth before I can determine that for sure.
>>>>>>>>>>
>>>>>>>>>> In the meantime, have you considered using the Alfresco
Webscript
>>>>>>>>>> connector?  It's the preferred way to do Alfresco
indexing, although there
>>>>>>>>>> have been issues reported having to do with running
it on some
>>>>>>>>>> configurations of Alfresco.  I'm not entirely sure
what the problem is
>>>>>>>>>> there; maybe a version dependency of some kind.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2015 at 7:43 AM, Paul Farrell <
>>>>>>>>>> pfarrell@funnelback.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>
>>>>>>>>>>> Hoping someone may be able to advise.
>>>>>>>>>>>
>>>>>>>>>>> I am currently using Manifold, together with
a CMIS connector,
>>>>>>>>>>> to retrieve and index content from an Alfresco
repository.
>>>>>>>>>>>
>>>>>>>>>>> All is going well apart from, what I would call,
the
>>>>>>>>>>> ‘incremental crawl’.
>>>>>>>>>>>
>>>>>>>>>>> The main issue I am having is that the modification
of a
>>>>>>>>>>> document’s security settings, in Alfresco,
is not being picked up in next
>>>>>>>>>>> Manifold crawl. As an example I have a document
‘TestDoc1’ which has user A
>>>>>>>>>>> and B as Consumers. I run a crawl in Manifold
and it picks up the documents
>>>>>>>>>>> fine.  The security is set as expected. I then
remove ‘User A’ from the
>>>>>>>>>>> security of that document and re-run the Manifold
crawl. User A can still
>>>>>>>>>>> see the document in the local search engine.
>>>>>>>>>>>
>>>>>>>>>>> It is as if Manifold is not treating the security
update as a
>>>>>>>>>>> ‘modification’ and is therefore not refreshing
it. Note that if I go into
>>>>>>>>>>> the Output Connections, edit and save the relevant
output connection and
>>>>>>>>>>> then click ‘Remove all associated documents’,
the next time I crawl, the
>>>>>>>>>>> changes are picked up. It is clear that Manifold
is just not updating
>>>>>>>>>>> whatever internal record it has for this item.
>>>>>>>>>>>
>>>>>>>>>>> Any ideas?
>>>>>>>>>>>
>>>>>>>>>>> Many thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Mime
View raw message