manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashank Raj <shashank.raj2...@gmail.com>
Subject Re: ManifoldCF two server setup
Date Fri, 23 Mar 2018 12:24:07 GMT
Hi Karl,
                We followed your documentation and made a multi node setup
both with file based synchronisation and zoo keeper based one. With zk
based setup, we found that if we run two jobs in two seperate tomcat
processes, only one job will pickup and post records. The other job will
begin to work only if we pause the first one. Is this the implementation of
multi process model? In our case both the tomcat processes should crawl and
send documents parallelly.
Also we found that the performance of file based synchronisation was not as
good as the zk based one.

Thanks and regards,
Shashank

On 13-Mar-2018 12:48 PM, "Karl Wright" <daddywri@gmail.com> wrote:

> Hi Raj,
>
> First, I'd start by running the multiprocess example on ONE machine with
> multiple processes.  That's what the multiprocess-file-example
> demonstrates, although it can be easily generalized to multiple machines,
> PROVIDED there is a shared file system available, like NFS.  If not, you
> must use the Zookeeper deployment model if there are multiple machines.
> The file synch has been deprecated and you will likely find it quite hard
> to work with in a multi-machine environment.
>
> The basic way you work with the examples is to use them on a single
> machine, get them working Initially, and then port one change at a time.
> Use the scripts provided.to start the database instance, initialize the
> database, and start the various processes.  THEN, when you are satisfied
> with how that works, you can start making changes.  The changes are, in
> order:
>
> - Using Postgresql rather than HSQLDB
> - Using Tomcat rather than Jetty
> - Using multiple machines, rather than one
>
> To answer your specific questions:
>
> (1) The files described are in common for all the examples, and are a
> level above where you are looking.  From the example directories, you can
> find them under ../web (or ../web-proprietary).
> (2) Yes, once you set up your connection to Postgresql in properties.xml,
> you DO need to run initialize-database, or the schema will not be created.
> (3) When you start different agents processes, even on different machines,
> each one must have its own ID.  The start scripts demonstrate how you do
> that.
>
> Karl
>
>
> On Tue, Mar 13, 2018 at 2:22 AM, Shashank Raj <shashank.raj2009@gmail.com>
> wrote:
>
>> Hi Karl,
>>             In the documentation for "Simplified multiprocess model using
>> file based synchronisation", it is indicated that the war files should be
>> taken from "web" folder of multiprocess-file-example. But there is no such
>> folder or file. Can we get some inputs on where do we need to take war
>> files from in this case?
>>
>> Regarding database , in the steps you have asked us to run start-database
>> and initialize-database script files but we have deployed it using pgsql
>> and database is getting created and initialized automatically with single
>> process file example for now.
>> Now we are switching to multiprocess model. Do we still need to run those
>> scripts.
>>
>> And should we run start-agent in one server and start-agent2 in another
>> server?
>>
>>
>> On 20-Feb-2018 9:21 PM, "Karl Wright" <daddywri@gmail.com> wrote:
>>
>>> Hi Shashank,
>>>
>>> You can have multiple servers running against the same database, BUT if
>>> you do so, they must be individually configured to have their own IDs, and
>>> they must share locks and by extension, must use the same zookeeper.
>>> See multiprocess-zk-example in the binary distribution.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>> On Tue, Feb 20, 2018 at 6:58 AM, Shashank Raj <
>>> shashank.raj2009@gmail.com> wrote:
>>>
>>>> Hi Karl,
>>>>             I have setup ManifoldCF using Tomcat on two servers with a
>>>> load balancer in front of them. Both instances of ManifoldCf connect to the
>>>> same database. The scenario is to have a backup server running all the
>>>> time. Is this setup correct or does ManifoldCF supports only a single
>>>> server setup.
>>>>
>>>> Also, I am getting an error  : Duplicate key value violates unique
>>>> constraint "repohistory_pkey". Detail: Key(id)=(1519119640499) already
>>>> exists.
>>>> This error pops up upon running jobs with different repositories.
>>>>
>>>> Our ManifoldCf job setup is as follows : File System>Tika Content
>>>> Extractor>Solr Output Connection.
>>>>
>>>> Thanks and regards.
>>>>
>>>>
>>>>
>>>>
>>>
>

Mime
View raw message