manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject RE: Zookeeper in Apache ManifoldCF
Date Thu, 03 Jul 2014 21:19:23 GMT
Hi lalit,

When data is pushed into the database that mcf uses but the mcf instance is
not doing the pushing, then caches everywhere will not be properly
invalidated.  It may be more appropriate to have only one cluster with two
members of each type (agents process, mcf UI, etc), if that would be
acceptable.

Karl

Sent from my Windows Phone
------------------------------
From: lalit jangra
Sent: 7/3/2014 1:23 PM
To: Karl Wright
Subject: Re: Zookeeper in Apache ManifoldCF

Hello Karl,

I have a set of two MCF servers each having its own tomcat server but
pointing to same Postgres DB.

I have also configured set of three zookeeper servers on each node of
cluster, started them, configured properties.xml & properties-global.xml on
both nodes. Finally i started zookeeper's start-agents.sh on both nodes.

While trying to run ./zkCli.sh -server localhost:2181 on both machines, i
am getting different outputs. Is it normal or i am missing something.

Node1.

[zk: localhost:2181(CONNECTED) 2] ls /

[org.apache.manifoldcf.service-AGENT,
org.apache.manifoldcf.servicelock-AGENT,
org.apache.manifoldcf.configuration,
org.apache.manifoldcf.serviceactive-AGENT-A, zookeeper]


Node2.

[zk: localhost:2181(CONNECTED) 1] ls /

[org.apache.manifoldcf.locks-statslock-reindex-jobqueue,
org.apache.manifoldcf.locks-_Cache_OUTPUTCONNECTION_Solr,
org.apache.manifoldcf.service-AGENT,
org.apache.manifoldcf.service-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent,
org.apache.manifoldcf.resources-stats-reindex-jobqueue,
org.apache.manifoldcf.serviceanon-_OUTPUTCONNECTORPOOL_Solr,
org.apache.manifoldcf.locks-_Cache_JOBSTATUSES,
org.apache.manifoldcf.locks-statslock-analyze-jobqueue,
org.apache.manifoldcf.servicelock-AGENT,
org.apache.manifoldcf.locks-_REPR_TRACKER_LOCK_,
org.apache.manifoldcf.configuration,
org.apache.manifoldcf.servicelock-_OUTPUTCONNECTORPOOL_Solr,
org.apache.manifoldcf.locks-_STUFFERTHREAD_LOCK,
org.apache.manifoldcf.service-_OUTPUTCONNECTORPOOL_Solr,
org.apache.manifoldcf.resources-_REPR_MINDEPTH_,
org.apache.manifoldcf.resources-_STUFFERTHREAD_LASTTIME,
org.apache.manifoldcf.resources-stats-analyze-jobqueue,
org.apache.manifoldcf.locks-_IDFACTORY_,
org.apache.manifoldcf.locks-_JOBRESET_,
org.apache.manifoldcf.servicelock-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent,
org.apache.manifoldcf.resources-cache-JOBSTATUSES,
org.apache.manifoldcf.locks-_JOBSTOP_,
org.apache.manifoldcf.locks-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr,
zookeeper, org.apache.manifoldcf.resources-_IDFACTORY_,
org.apache.manifoldcf.locks-_Cache_JOB_1404323519962,
org.apache.manifoldcf.locks-_Cache_DB-mcfdb-TBL-outputconnectors,
org.apache.manifoldcf.locks-_JOBRESUME_]


Also in clustered setup, i noticed one strange behavior.

If i created a job on say MCF1 in clustered setup, it is created but not
replicated to MCF2 node. I need to restart MCF2 node to get it replicated
there. Is it OK?

Please suggest.

Regards.


On Wed, Jul 2, 2014 at 10:49 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi lalit,
>
> Each agents process in a cluster needs its own Id. Please look carefully
> at the multiprocess zookeeper example for details how to do that.  If you
> didn't intend for there to be multiple agents processes in one cluster, you
> did something wrong, because that is what you have.
>
>
> Karl
>
> Sent from my Windows Phone
> ------------------------------
> From: lalit jangra
> Sent: 7/2/2014 2:11 PM
> To: Karl Wright
> Cc: user@manifoldcf.apache.org
>
> Subject: Re: Zookeeper in Apache ManifoldCF
>
>  Hello,
>
> I have configured 3 zookeeper instances on port 2181, 2182, 2183 on my
> server and in mcf/dist/mulitprocess-zk-example i have configured all three
> servers as comma separated list.
>
> Now i have started all three zookeeper instances and i could see all three
> running. Next i tried with a crawl job but in manifoldcf.logs, i can see
> below error.
>
> ERROR 2014-07-02 19:07:15,716 (Agents thread) - Exception tossed: Service
> '' of type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is
> already active
>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service '' of
> type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already
> active
>
>         at
> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:156)
>
>         at
> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:120)
>
>         at
> org.apache.manifoldcf.core.lockmanager.LockManager.registerServiceBeginServiceActivity(LockManager.java:69)
>
>         at
> org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270)
>
>         at
> org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208)
>
>
> How can i validate that these errors are not related to zookeeper or not?
> Also how to know if MCF is integrated with zookeeper.
>
>
> Regards.
>
>
>
> On Tue, Jul 1, 2014 at 3:19 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Lalit,
>>
>> I presumed in my recommendation that your "active" and "passive"
>> manifoldcf instances were using the same PostgreSQL server, but were using
>> different database instances within it.  That is the only way it could
>> reasonable work.
>>
>> Any time you have a Zookeeper cluster, they recommend you have three
>> instances.  Effectively you are setting up two ManifoldCF clusters: an
>> "active" one, and a "passive" one.  Each one has its own database instance
>> within PostgreSQL, and each one (if it is multiprocess) should have 3
>> zookeeper instances.
>>
>> I hope this is clear.
>>
>> Karl
>>
>>
>>
>> On Tue, Jul 1, 2014 at 9:54 AM, lalit jangra <lalit.j.jangra@gmail.com>
>> wrote:
>>
>>> Thanks Karl,
>>>
>>> I have a little variation here and this is about having both MCF nodes
>>> in Active/Active nodes pointing to same DB, so still Zookeeper is required?
>>>
>>> Also does it mean by " two sets of three zookeeper machines",  i need to
>>> setup three zookeepers onto each node so total 6 zookepeer node here
>>> working on both machine in same  ensamble?
>>>
>>> Regards.
>>>
>>>
>>> On Mon, Jun 30, 2014 at 6:50 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Lalit,
>>>>
>>>> You can keep things really simple by having both active and passive mcf
>>>> instances run each as a single process, either under jetty or using the
>>>> combined war under tomcat.  If that is not acceptable, you would need two
>>>> sets of three zookeeper machines, one set for each instance.
>>>>
>>>> Karl
>>>>
>>>> Sent from my Windows Phone
>>>> ------------------------------
>>>> From: lalit jangra
>>>> Sent: 6/30/2014 12:19 PM
>>>> To: user@manifoldcf.apache.org
>>>> Subject: Re: Zookeeper in Apache ManifoldCF
>>>>
>>>>  Thanks Karl & Graeme,
>>>>
>>>> Let me elaborate my scenario and what i am trying to achieve.
>>>>
>>>> I have two servers each running MCF 1.5.1 individually. But both of
>>>> them are backed by same PostGreSQL DB so both of MCF applications are
>>>> pointing to same DB at any point of time, without having their own
>>>> dedicated DBs. Next, primary/active DB instance is  backed up with
>>>> periodical backups from active to passive instance.
>>>>
>>>> Only one DB instance will be active at any time, with other DB instance
>>>> acting as active standby. In case of breakdown of primary/active instance,
>>>> passive/secondary will take over and becomes primary/active instance
>>>> handling all DB transactions, thus making primary as new secondary DB
>>>> instance.
>>>>
>>>> Similarly i have two solr 4.6 instances which act in active/passive
>>>> mode with periodic backup of active/primary to passive/secondary with
>>>> active standby and failover.
>>>>
>>>> So my intention of clustering is high availability of system with
>>>> failover but i will not use both of MCF instances parallely or
>>>> simultaneously.
>>>>
>>>> Finally i am limited to having two instances only but as mentioned
>>>> earlier, we need at least three Zookeeper instances for a proper Zookeeper
>>>> clustering.
>>>>
>>>> Is it still worthy to go and use Zookeeper or i can do simple
>>>> clustering where each of MCF node is clustered using same DB. Please
>>>> suggest.
>>>>
>>>> Thanks for help.
>>>>
>>>> Regards.
>>>>
>>>>
>>>> On Fri, Jun 27, 2014 at 11:15 AM, Graeme Seaton <lists@graemes.com>
>>>> wrote:
>>>>
>>>>>  Hi Lalit,
>>>>>
>>>>> For production use, you will want to spin up your own ZK cluster using
>>>>> the instructions on the zookeeper site (as pointed out earlier at least
3
>>>>> is recommended)....
>>>>>
>>>>> You then need to modify the properties.xml file in
>>>>> multiprocess-zk-example to point to the list of Zookeeper servers.  You
>>>>> also need to modify properties-global.xml with the appropriate global
>>>>> settings i.e. logging levels, Postgresql database etc. and then run
>>>>> setglobalproperties.sh to register the settings in ZK.
>>>>>
>>>>> To test that is working, set up a crawl and then tail the
>>>>> manifoldcf.log file on each of your nodes to check that they are all
>>>>> crawling in parallel.
>>>>>
>>>>> HTH,
>>>>>
>>>>> Graeme
>>>>>
>>>>>
>>>>> On 25/06/14 12:19, Karl Wright wrote:
>>>>>
>>>>>  Hi Lalit,
>>>>>
>>>>> Zookeeper does not use a database; it keeps its stuff in the local
>>>>> file system.  Each Zookeeper node has its own local data, and everything
>>>>> else is socket communication between them.
>>>>>
>>>>>  As for information: http://zookeeper.apache.org/
>>>>>
>>>>>  Karl
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 25, 2014 at 6:56 AM, lalit jangra <
>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>
>>>>>>  Thanks Karl,
>>>>>>
>>>>>> Apologies as i am not very familiar with Zookeeper and trying to
>>>>>> figure out on same.
>>>>>>
>>>>>> Is there any more documentation/pointers available for same as that
>>>>>> would be more helpful.
>>>>>>
>>>>>>  Also i have 2 tomcat servers in cluster, each having MCF 1.5.1
>>>>>> setup and configured to point to same PostGreSQL DB & DB is backed
up for
>>>>>> failover. From your inputs, it seems that we need to configure a
separate
>>>>>> standalone Zookeeper server which will act as Master and both nodes
in
>>>>>> cluster will need to work as slaves and talk to standalone Zookeeper
master.
>>>>>>
>>>>>>  Also the Zookeeper server will have its own DB so either we can
>>>>>> host it separately or we can use same Postgres DB?
>>>>>>
>>>>>>  Regards.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 25, 2014 at 11:33 AM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>   Hi Lalit,
>>>>>>>
>>>>>>>  1. zookeeper is already spun into MCF.  in fact you start a
>>>>>>> zookeeper instance when you run the mcf zookeeper example.  They
recommend,
>>>>>>> though, that for failover you have 3 instances, etc.
>>>>>>>  2. Looks like the documentation is out of date and something
old is
>>>>>>> left in there.
>>>>>>>  3. Zookeeper is a client/server kind of arrangement.  You need
at
>>>>>>> least ONE zookeeper server, and each cluster member includes
a zookeeper
>>>>>>> client, which is configured to talk with ALL the zookeeper server
instances
>>>>>>> you have.
>>>>>>>  4.  There is ONE database instance; the instance may be supported
>>>>>>> by failover and redundant Postgresql, but it appears as one instance.
 TO
>>>>>>> get failover from Postgres you need the Enterprise Edition, which
costs
>>>>>>> money.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 25, 2014 at 4:47 AM, lalit jangra <
>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>
>>>>>>>>     Thanks Karl,
>>>>>>>>
>>>>>>>>  That was helpful.
>>>>>>>>
>>>>>>>>  I am setting clustered setup on Tomcats as i was following
>>>>>>>> instructions @
>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Simplified+multi-process+model+using+ZooKeeper-based+synchronization
>>>>>>>> and i need some suggestions here.
>>>>>>>>
>>>>>>>>  1. Do we need to download zookeeper and put it in
>>>>>>>> multiprocess-zk-example folder or it is already spun into
MCF and we are
>>>>>>>> good to go?
>>>>>>>>  2. It says all jars under *processes *should be put into
>>>>>>>> classpath but i can not see any *processes *folder under
MCF?
>>>>>>>>  3. Do we need to setup Zookeeper on both nodes or only at
one
>>>>>>>> node, i assume we need to do on both nodes ?
>>>>>>>>  4. Do we also need to setup databases separately on both
nodes
>>>>>>>> again. Also can we setup Zookeeper DB using same PostGreSQL
or it will use
>>>>>>>> its own HSQL DB?
>>>>>>>>
>>>>>>>>  Finally how can i test that my Zookeeper is setp and ready
to
>>>>>>>> roll?
>>>>>>>>
>>>>>>>>  Thanks for your help.
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>>
>>>>>>>>
>>>>>>>>  On Tue, Jun 24, 2014 at 1:56 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>  Hi Lalit,
>>>>>>>>>  ZooKeeper is standard for cluster deployments these
days.  See
>>>>>>>>> the multiprocess-zookeeper example for ideas about how
to deploy it.  It's
>>>>>>>>> also important to read the how-to-build-and-deploy page
to understand the
>>>>>>>>> example.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 24, 2014 at 8:04 AM, lalit jangra <
>>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>  Hi,
>>>>>>>>>>
>>>>>>>>>>  I am planning to use MCF in cluster mode. For same,
i want to
>>>>>>>>>> know if Zookeeper is of any help here?
>>>>>>>>>>
>>>>>>>>>>  If yes, how can it be leveraged in distributed MCF
servers?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Lalit Jangra.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> Regards,
>>>>>>>> Lalit Jangra.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Regards,
>>>>>> Lalit Jangra.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Lalit Jangra.
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Lalit Jangra.
>>>
>>
>>
>
>
> --
> Regards,
> Lalit Jangra.
>



-- 
Regards,
Lalit Jangra.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message