manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lalit jangra <lalit.j.jan...@gmail.com>
Subject Re: Zookeeper in Apache ManifoldCF
Date Thu, 03 Jul 2014 21:48:10 GMT
Thanks Karl,

I am having one cluster with two MCF instances pointing to one single DB.

Can you please elaborate a bit more?

regards.




On Thu, Jul 3, 2014 at 10:19 PM, Karl Wright <daddywri@gmail.com> wrote:

>
> Hi lalit,
>
> When data is pushed into the database that mcf uses but the mcf instance
> is not doing the pushing, then caches everywhere will not be properly
> invalidated.  It may be more appropriate to have only one cluster with two
> members of each type (agents process, mcf UI, etc), if that would be
> acceptable.
>
>
> Karl
>
> Sent from my Windows Phone
> ------------------------------
> From: lalit jangra
> Sent: 7/3/2014 1:23 PM
> To: Karl Wright
>
> Subject: Re: Zookeeper in Apache ManifoldCF
>
> Hello Karl,
>
> I have a set of two MCF servers each having its own tomcat server but
> pointing to same Postgres DB.
>
> I have also configured set of three zookeeper servers on each node of
> cluster, started them, configured properties.xml & properties-global.xml on
> both nodes. Finally i started zookeeper's start-agents.sh on both nodes.
>
> While trying to run ./zkCli.sh -server localhost:2181 on both machines, i
> am getting different outputs. Is it normal or i am missing something.
>
> Node1.
>
> [zk: localhost:2181(CONNECTED) 2] ls /
>
> [org.apache.manifoldcf.service-AGENT,
> org.apache.manifoldcf.servicelock-AGENT,
> org.apache.manifoldcf.configuration,
> org.apache.manifoldcf.serviceactive-AGENT-A, zookeeper]
>
>
> Node2.
>
> [zk: localhost:2181(CONNECTED) 1] ls /
>
> [org.apache.manifoldcf.locks-statslock-reindex-jobqueue,
> org.apache.manifoldcf.locks-_Cache_OUTPUTCONNECTION_Solr,
> org.apache.manifoldcf.service-AGENT,
> org.apache.manifoldcf.service-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent,
> org.apache.manifoldcf.resources-stats-reindex-jobqueue,
> org.apache.manifoldcf.serviceanon-_OUTPUTCONNECTORPOOL_Solr,
> org.apache.manifoldcf.locks-_Cache_JOBSTATUSES,
> org.apache.manifoldcf.locks-statslock-analyze-jobqueue,
> org.apache.manifoldcf.servicelock-AGENT,
> org.apache.manifoldcf.locks-_REPR_TRACKER_LOCK_,
> org.apache.manifoldcf.configuration,
> org.apache.manifoldcf.servicelock-_OUTPUTCONNECTORPOOL_Solr,
> org.apache.manifoldcf.locks-_STUFFERTHREAD_LOCK,
> org.apache.manifoldcf.service-_OUTPUTCONNECTORPOOL_Solr,
> org.apache.manifoldcf.resources-_REPR_MINDEPTH_,
> org.apache.manifoldcf.resources-_STUFFERTHREAD_LASTTIME,
> org.apache.manifoldcf.resources-stats-analyze-jobqueue,
> org.apache.manifoldcf.locks-_IDFACTORY_,
> org.apache.manifoldcf.locks-_JOBRESET_,
> org.apache.manifoldcf.servicelock-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent,
> org.apache.manifoldcf.resources-cache-JOBSTATUSES,
> org.apache.manifoldcf.locks-_JOBSTOP_,
> org.apache.manifoldcf.locks-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr,
> zookeeper, org.apache.manifoldcf.resources-_IDFACTORY_,
> org.apache.manifoldcf.locks-_Cache_JOB_1404323519962,
> org.apache.manifoldcf.locks-_Cache_DB-mcfdb-TBL-outputconnectors,
> org.apache.manifoldcf.locks-_JOBRESUME_]
>
>
> Also in clustered setup, i noticed one strange behavior.
>
> If i created a job on say MCF1 in clustered setup, it is created but not
> replicated to MCF2 node. I need to restart MCF2 node to get it replicated
> there. Is it OK?
>
> Please suggest.
>
> Regards.
>
>
> On Wed, Jul 2, 2014 at 10:49 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi lalit,
>>
>> Each agents process in a cluster needs its own Id. Please look carefully
>> at the multiprocess zookeeper example for details how to do that.  If you
>> didn't intend for there to be multiple agents processes in one cluster, you
>> did something wrong, because that is what you have.
>>
>>
>> Karl
>>
>> Sent from my Windows Phone
>> ------------------------------
>> From: lalit jangra
>> Sent: 7/2/2014 2:11 PM
>> To: Karl Wright
>> Cc: user@manifoldcf.apache.org
>>
>> Subject: Re: Zookeeper in Apache ManifoldCF
>>
>>  Hello,
>>
>> I have configured 3 zookeeper instances on port 2181, 2182, 2183 on my
>> server and in mcf/dist/mulitprocess-zk-example i have configured all three
>> servers as comma separated list.
>>
>> Now i have started all three zookeeper instances and i could see all
>> three running. Next i tried with a crawl job but in manifoldcf.logs, i can
>> see below error.
>>
>> ERROR 2014-07-02 19:07:15,716 (Agents thread) - Exception tossed: Service
>> '' of type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is
>> already active
>>
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service '' of
>> type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already
>> active
>>
>>         at
>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:156)
>>
>>         at
>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:120)
>>
>>         at
>> org.apache.manifoldcf.core.lockmanager.LockManager.registerServiceBeginServiceActivity(LockManager.java:69)
>>
>>         at
>> org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270)
>>
>>         at
>> org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208)
>>
>>
>> How can i validate that these errors are not related to zookeeper or not?
>> Also how to know if MCF is integrated with zookeeper.
>>
>>
>> Regards.
>>
>>
>>
>> On Tue, Jul 1, 2014 at 3:19 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Lalit,
>>>
>>> I presumed in my recommendation that your "active" and "passive"
>>> manifoldcf instances were using the same PostgreSQL server, but were using
>>> different database instances within it.  That is the only way it could
>>> reasonable work.
>>>
>>> Any time you have a Zookeeper cluster, they recommend you have three
>>> instances.  Effectively you are setting up two ManifoldCF clusters: an
>>> "active" one, and a "passive" one.  Each one has its own database instance
>>> within PostgreSQL, and each one (if it is multiprocess) should have 3
>>> zookeeper instances.
>>>
>>> I hope this is clear.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Jul 1, 2014 at 9:54 AM, lalit jangra <lalit.j.jangra@gmail.com>
>>> wrote:
>>>
>>>> Thanks Karl,
>>>>
>>>> I have a little variation here and this is about having both MCF nodes
>>>> in Active/Active nodes pointing to same DB, so still Zookeeper is required?
>>>>
>>>> Also does it mean by " two sets of three zookeeper machines",  i need
>>>> to setup three zookeepers onto each node so total 6 zookepeer node here
>>>> working on both machine in same  ensamble?
>>>>
>>>> Regards.
>>>>
>>>>
>>>> On Mon, Jun 30, 2014 at 6:50 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Lalit,
>>>>>
>>>>> You can keep things really simple by having both active and passive
>>>>> mcf instances run each as a single process, either under jetty or using
the
>>>>> combined war under tomcat.  If that is not acceptable, you would need
two
>>>>> sets of three zookeeper machines, one set for each instance.
>>>>>
>>>>> Karl
>>>>>
>>>>> Sent from my Windows Phone
>>>>> ------------------------------
>>>>> From: lalit jangra
>>>>> Sent: 6/30/2014 12:19 PM
>>>>> To: user@manifoldcf.apache.org
>>>>> Subject: Re: Zookeeper in Apache ManifoldCF
>>>>>
>>>>>  Thanks Karl & Graeme,
>>>>>
>>>>> Let me elaborate my scenario and what i am trying to achieve.
>>>>>
>>>>> I have two servers each running MCF 1.5.1 individually. But both of
>>>>> them are backed by same PostGreSQL DB so both of MCF applications are
>>>>> pointing to same DB at any point of time, without having their own
>>>>> dedicated DBs. Next, primary/active DB instance is  backed up with
>>>>> periodical backups from active to passive instance.
>>>>>
>>>>> Only one DB instance will be active at any time, with other DB
>>>>> instance acting as active standby. In case of breakdown of primary/active
>>>>> instance, passive/secondary will take over and becomes primary/active
>>>>> instance handling all DB transactions, thus making primary as new secondary
>>>>> DB instance.
>>>>>
>>>>> Similarly i have two solr 4.6 instances which act in active/passive
>>>>> mode with periodic backup of active/primary to passive/secondary with
>>>>> active standby and failover.
>>>>>
>>>>> So my intention of clustering is high availability of system with
>>>>> failover but i will not use both of MCF instances parallely or
>>>>> simultaneously.
>>>>>
>>>>> Finally i am limited to having two instances only but as mentioned
>>>>> earlier, we need at least three Zookeeper instances for a proper Zookeeper
>>>>> clustering.
>>>>>
>>>>> Is it still worthy to go and use Zookeeper or i can do simple
>>>>> clustering where each of MCF node is clustered using same DB. Please
>>>>> suggest.
>>>>>
>>>>> Thanks for help.
>>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>>> On Fri, Jun 27, 2014 at 11:15 AM, Graeme Seaton <lists@graemes.com>
>>>>> wrote:
>>>>>
>>>>>>  Hi Lalit,
>>>>>>
>>>>>> For production use, you will want to spin up your own ZK cluster
>>>>>> using the instructions on the zookeeper site (as pointed out earlier
at
>>>>>> least 3 is recommended)....
>>>>>>
>>>>>> You then need to modify the properties.xml file in
>>>>>> multiprocess-zk-example to point to the list of Zookeeper servers.
 You
>>>>>> also need to modify properties-global.xml with the appropriate global
>>>>>> settings i.e. logging levels, Postgresql database etc. and then run
>>>>>> setglobalproperties.sh to register the settings in ZK.
>>>>>>
>>>>>> To test that is working, set up a crawl and then tail the
>>>>>> manifoldcf.log file on each of your nodes to check that they are
all
>>>>>> crawling in parallel.
>>>>>>
>>>>>> HTH,
>>>>>>
>>>>>> Graeme
>>>>>>
>>>>>>
>>>>>> On 25/06/14 12:19, Karl Wright wrote:
>>>>>>
>>>>>>  Hi Lalit,
>>>>>>
>>>>>> Zookeeper does not use a database; it keeps its stuff in the local
>>>>>> file system.  Each Zookeeper node has its own local data, and everything
>>>>>> else is socket communication between them.
>>>>>>
>>>>>>  As for information: http://zookeeper.apache.org/
>>>>>>
>>>>>>  Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 25, 2014 at 6:56 AM, lalit jangra <
>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>
>>>>>>>  Thanks Karl,
>>>>>>>
>>>>>>> Apologies as i am not very familiar with Zookeeper and trying
to
>>>>>>> figure out on same.
>>>>>>>
>>>>>>> Is there any more documentation/pointers available for same as
that
>>>>>>> would be more helpful.
>>>>>>>
>>>>>>>  Also i have 2 tomcat servers in cluster, each having MCF 1.5.1
>>>>>>> setup and configured to point to same PostGreSQL DB & DB
is backed up for
>>>>>>> failover. From your inputs, it seems that we need to configure
a separate
>>>>>>> standalone Zookeeper server which will act as Master and both
nodes in
>>>>>>> cluster will need to work as slaves and talk to standalone Zookeeper
master.
>>>>>>>
>>>>>>>  Also the Zookeeper server will have its own DB so either we
can
>>>>>>> host it separately or we can use same Postgres DB?
>>>>>>>
>>>>>>>  Regards.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 25, 2014 at 11:33 AM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>   Hi Lalit,
>>>>>>>>
>>>>>>>>  1. zookeeper is already spun into MCF.  in fact you start
a
>>>>>>>> zookeeper instance when you run the mcf zookeeper example.
 They recommend,
>>>>>>>> though, that for failover you have 3 instances, etc.
>>>>>>>>  2. Looks like the documentation is out of date and something
old
>>>>>>>> is left in there.
>>>>>>>>  3. Zookeeper is a client/server kind of arrangement.  You
need at
>>>>>>>> least ONE zookeeper server, and each cluster member includes
a zookeeper
>>>>>>>> client, which is configured to talk with ALL the zookeeper
server instances
>>>>>>>> you have.
>>>>>>>>  4.  There is ONE database instance; the instance may be
supported
>>>>>>>> by failover and redundant Postgresql, but it appears as one
instance.  TO
>>>>>>>> get failover from Postgres you need the Enterprise Edition,
which costs
>>>>>>>> money.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 25, 2014 at 4:47 AM, lalit jangra <
>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>     Thanks Karl,
>>>>>>>>>
>>>>>>>>>  That was helpful.
>>>>>>>>>
>>>>>>>>>  I am setting clustered setup on Tomcats as i was following
>>>>>>>>> instructions @
>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Simplified+multi-process+model+using+ZooKeeper-based+synchronization
>>>>>>>>> and i need some suggestions here.
>>>>>>>>>
>>>>>>>>>  1. Do we need to download zookeeper and put it in
>>>>>>>>> multiprocess-zk-example folder or it is already spun
into MCF and we are
>>>>>>>>> good to go?
>>>>>>>>>  2. It says all jars under *processes *should be put
into
>>>>>>>>> classpath but i can not see any *processes *folder under
MCF?
>>>>>>>>>  3. Do we need to setup Zookeeper on both nodes or only
at one
>>>>>>>>> node, i assume we need to do on both nodes ?
>>>>>>>>>  4. Do we also need to setup databases separately on
both nodes
>>>>>>>>> again. Also can we setup Zookeeper DB using same PostGreSQL
or it will use
>>>>>>>>> its own HSQL DB?
>>>>>>>>>
>>>>>>>>>  Finally how can i test that my Zookeeper is setp and
ready to
>>>>>>>>> roll?
>>>>>>>>>
>>>>>>>>>  Thanks for your help.
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  On Tue, Jun 24, 2014 at 1:56 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>  Hi Lalit,
>>>>>>>>>>  ZooKeeper is standard for cluster deployments these
days.  See
>>>>>>>>>> the multiprocess-zookeeper example for ideas about
how to deploy it.  It's
>>>>>>>>>> also important to read the how-to-build-and-deploy
page to understand the
>>>>>>>>>> example.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 24, 2014 at 8:04 AM, lalit jangra <
>>>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>  Hi,
>>>>>>>>>>>
>>>>>>>>>>>  I am planning to use MCF in cluster mode. For
same, i want to
>>>>>>>>>>> know if Zookeeper is of any help here?
>>>>>>>>>>>
>>>>>>>>>>>  If yes, how can it be leveraged in distributed
MCF servers?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Lalit Jangra.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Regards,
>>>>>>>>> Lalit Jangra.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Regards,
>>>>>>> Lalit Jangra.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Lalit Jangra.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Lalit Jangra.
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Lalit Jangra.
>>
>
>
>
> --
> Regards,
> Lalit Jangra.
>



-- 
Regards,
Lalit Jangra.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message