manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Zookeeper in Apache ManifoldCF
Date Fri, 04 Jul 2014 13:37:32 GMT
Hi Lalit,

If you are already running your own zookeeper cluster, you do not need to
start zookeeper via the runzookeeper script.

Karl



On Fri, Jul 4, 2014 at 9:04 AM, lalit jangra <lalit.j.jangra@gmail.com>
wrote:

> Hello Karl,
>
> I have got my cluster working with each node in cluster having three
> zookeeper nodes so total six nodes. I have connected both MCF1 & MCF2 to
> this cluster by
> using below steps.
>
>
>    1. Start ZooKeeper (using the *runzookeeper[.sh|.bat]* script)
>    2. Initialize the ManifoldCF shared configuration data (using
>    *setglobalproperties[.sh|.bat]*)
>    3. Start the database (using *start-database[.sh|.bat]*)
>    4. Initialize the database (using *initialize[.sh|.bat]*)
>    5. Start the agents process (using *start-agents[.sh|.bat]*, and
>    optionally *start-agents-2[.sh|.bat]*)
>    6. Modify the Tomcat startup script, or use the Tomcat service
>    administration client, to set a Java "-Dorg.apache.manifoldcf.configfile"
>    switch to point to the example's *properties.xml* file.
>    7. Start Tomcat.
>    8. Deploy and start the mcf-crawler-ui, mcf-authority-service, and
>    mcf-api-service web applications, preferably using the Tomcat
>    administration client.
>
> I just want to ask if we need to start zookeeper as in step 1 as we have
> already a  cluster of zookeeper servers up and running?
>
> Also i want to confirm if we need to update tomcat as in step 6 as i did
> not do that but i am not getting any error as such? Are there any
> implications for not using this?
>
> Finally i would ask for a little help as first i did my setup using
> multiprocess-file-exmaple by initializing DB but then for zookeeper i moved
> to   multiprocess-zk-exmaple.
>
> Is it safe to use in production?
>
> Regards.
>
>
>
>
> On Thu, Jul 3, 2014 at 10:48 PM, lalit jangra <lalit.j.jangra@gmail.com>
> wrote:
>
>> Thanks Karl,
>>
>> I am having one cluster with two MCF instances pointing to one single DB.
>>
>> Can you please elaborate a bit more?
>>
>> regards.
>>
>>
>>
>>
>> On Thu, Jul 3, 2014 at 10:19 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>>
>>> Hi lalit,
>>>
>>> When data is pushed into the database that mcf uses but the mcf instance
>>> is not doing the pushing, then caches everywhere will not be properly
>>> invalidated.  It may be more appropriate to have only one cluster with two
>>> members of each type (agents process, mcf UI, etc), if that would be
>>> acceptable.
>>>
>>>
>>> Karl
>>>
>>> Sent from my Windows Phone
>>> ------------------------------
>>> From: lalit jangra
>>> Sent: 7/3/2014 1:23 PM
>>> To: Karl Wright
>>>
>>> Subject: Re: Zookeeper in Apache ManifoldCF
>>>
>>>  Hello Karl,
>>>
>>> I have a set of two MCF servers each having its own tomcat server but
>>> pointing to same Postgres DB.
>>>
>>> I have also configured set of three zookeeper servers on each node of
>>> cluster, started them, configured properties.xml & properties-global.xml
on
>>> both nodes. Finally i started zookeeper's start-agents.sh on both nodes.
>>>
>>> While trying to run ./zkCli.sh -server localhost:2181 on both machines,
>>> i am getting different outputs. Is it normal or i am missing something.
>>>
>>> Node1.
>>>
>>> [zk: localhost:2181(CONNECTED) 2] ls /
>>>
>>> [org.apache.manifoldcf.service-AGENT,
>>> org.apache.manifoldcf.servicelock-AGENT,
>>> org.apache.manifoldcf.configuration,
>>> org.apache.manifoldcf.serviceactive-AGENT-A, zookeeper]
>>>
>>>
>>> Node2.
>>>
>>> [zk: localhost:2181(CONNECTED) 1] ls /
>>>
>>> [org.apache.manifoldcf.locks-statslock-reindex-jobqueue,
>>> org.apache.manifoldcf.locks-_Cache_OUTPUTCONNECTION_Solr,
>>> org.apache.manifoldcf.service-AGENT,
>>> org.apache.manifoldcf.service-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent,
>>> org.apache.manifoldcf.resources-stats-reindex-jobqueue,
>>> org.apache.manifoldcf.serviceanon-_OUTPUTCONNECTORPOOL_Solr,
>>> org.apache.manifoldcf.locks-_Cache_JOBSTATUSES,
>>> org.apache.manifoldcf.locks-statslock-analyze-jobqueue,
>>> org.apache.manifoldcf.servicelock-AGENT,
>>> org.apache.manifoldcf.locks-_REPR_TRACKER_LOCK_,
>>> org.apache.manifoldcf.configuration,
>>> org.apache.manifoldcf.servicelock-_OUTPUTCONNECTORPOOL_Solr,
>>> org.apache.manifoldcf.locks-_STUFFERTHREAD_LOCK,
>>> org.apache.manifoldcf.service-_OUTPUTCONNECTORPOOL_Solr,
>>> org.apache.manifoldcf.resources-_REPR_MINDEPTH_,
>>> org.apache.manifoldcf.resources-_STUFFERTHREAD_LASTTIME,
>>> org.apache.manifoldcf.resources-stats-analyze-jobqueue,
>>> org.apache.manifoldcf.locks-_IDFACTORY_,
>>> org.apache.manifoldcf.locks-_JOBRESET_,
>>> org.apache.manifoldcf.servicelock-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent,
>>> org.apache.manifoldcf.resources-cache-JOBSTATUSES,
>>> org.apache.manifoldcf.locks-_JOBSTOP_,
>>> org.apache.manifoldcf.locks-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr,
>>> zookeeper, org.apache.manifoldcf.resources-_IDFACTORY_,
>>> org.apache.manifoldcf.locks-_Cache_JOB_1404323519962,
>>> org.apache.manifoldcf.locks-_Cache_DB-mcfdb-TBL-outputconnectors,
>>> org.apache.manifoldcf.locks-_JOBRESUME_]
>>>
>>>
>>> Also in clustered setup, i noticed one strange behavior.
>>>
>>> If i created a job on say MCF1 in clustered setup, it is created but not
>>> replicated to MCF2 node. I need to restart MCF2 node to get it replicated
>>> there. Is it OK?
>>>
>>> Please suggest.
>>>
>>> Regards.
>>>
>>>
>>> On Wed, Jul 2, 2014 at 10:49 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi lalit,
>>>>
>>>> Each agents process in a cluster needs its own Id. Please look
>>>> carefully at the multiprocess zookeeper example for details how to do
>>>> that.  If you didn't intend for there to be multiple agents processes in
>>>> one cluster, you did something wrong, because that is what you have.
>>>>
>>>>
>>>> Karl
>>>>
>>>> Sent from my Windows Phone
>>>> ------------------------------
>>>> From: lalit jangra
>>>> Sent: 7/2/2014 2:11 PM
>>>> To: Karl Wright
>>>> Cc: user@manifoldcf.apache.org
>>>>
>>>> Subject: Re: Zookeeper in Apache ManifoldCF
>>>>
>>>>  Hello,
>>>>
>>>> I have configured 3 zookeeper instances on port 2181, 2182, 2183 on my
>>>> server and in mcf/dist/mulitprocess-zk-example i have configured all three
>>>> servers as comma separated list.
>>>>
>>>> Now i have started all three zookeeper instances and i could see all
>>>> three running. Next i tried with a crawl job but in manifoldcf.logs, i can
>>>> see below error.
>>>>
>>>> ERROR 2014-07-02 19:07:15,716 (Agents thread) - Exception tossed:
>>>> Service '' of type
>>>> 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already active
>>>>
>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service ''
>>>> of type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is
>>>> already active
>>>>
>>>>         at
>>>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:156)
>>>>
>>>>         at
>>>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:120)
>>>>
>>>>         at
>>>> org.apache.manifoldcf.core.lockmanager.LockManager.registerServiceBeginServiceActivity(LockManager.java:69)
>>>>
>>>>         at
>>>> org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270)
>>>>
>>>>         at
>>>> org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208)
>>>>
>>>>
>>>> How can i validate that these errors are not related to zookeeper or
>>>> not? Also how to know if MCF is integrated with zookeeper.
>>>>
>>>>
>>>> Regards.
>>>>
>>>>
>>>>
>>>> On Tue, Jul 1, 2014 at 3:19 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>>> Hi Lalit,
>>>>>
>>>>> I presumed in my recommendation that your "active" and "passive"
>>>>> manifoldcf instances were using the same PostgreSQL server, but were
using
>>>>> different database instances within it.  That is the only way it could
>>>>> reasonable work.
>>>>>
>>>>> Any time you have a Zookeeper cluster, they recommend you have three
>>>>> instances.  Effectively you are setting up two ManifoldCF clusters: an
>>>>> "active" one, and a "passive" one.  Each one has its own database instance
>>>>> within PostgreSQL, and each one (if it is multiprocess) should have 3
>>>>> zookeeper instances.
>>>>>
>>>>> I hope this is clear.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 1, 2014 at 9:54 AM, lalit jangra <lalit.j.jangra@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Thanks Karl,
>>>>>>
>>>>>> I have a little variation here and this is about having both MCF
>>>>>> nodes in Active/Active nodes pointing to same DB, so still Zookeeper
is
>>>>>> required?
>>>>>>
>>>>>> Also does it mean by " two sets of three zookeeper machines",  i
need
>>>>>> to setup three zookeepers onto each node so total 6 zookepeer node
here
>>>>>> working on both machine in same  ensamble?
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 30, 2014 at 6:50 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Lalit,
>>>>>>>
>>>>>>> You can keep things really simple by having both active and passive
>>>>>>> mcf instances run each as a single process, either under jetty
or using the
>>>>>>> combined war under tomcat.  If that is not acceptable, you would
need two
>>>>>>> sets of three zookeeper machines, one set for each instance.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> Sent from my Windows Phone
>>>>>>> ------------------------------
>>>>>>> From: lalit jangra
>>>>>>> Sent: 6/30/2014 12:19 PM
>>>>>>> To: user@manifoldcf.apache.org
>>>>>>> Subject: Re: Zookeeper in Apache ManifoldCF
>>>>>>>
>>>>>>>  Thanks Karl & Graeme,
>>>>>>>
>>>>>>> Let me elaborate my scenario and what i am trying to achieve.
>>>>>>>
>>>>>>> I have two servers each running MCF 1.5.1 individually. But both
of
>>>>>>> them are backed by same PostGreSQL DB so both of MCF applications
are
>>>>>>> pointing to same DB at any point of time, without having their
own
>>>>>>> dedicated DBs. Next, primary/active DB instance is  backed up
with
>>>>>>> periodical backups from active to passive instance.
>>>>>>>
>>>>>>> Only one DB instance will be active at any time, with other DB
>>>>>>> instance acting as active standby. In case of breakdown of primary/active
>>>>>>> instance, passive/secondary will take over and becomes primary/active
>>>>>>> instance handling all DB transactions, thus making primary as
new secondary
>>>>>>> DB instance.
>>>>>>>
>>>>>>> Similarly i have two solr 4.6 instances which act in active/passive
>>>>>>> mode with periodic backup of active/primary to passive/secondary
with
>>>>>>> active standby and failover.
>>>>>>>
>>>>>>> So my intention of clustering is high availability of system
with
>>>>>>> failover but i will not use both of MCF instances parallely or
>>>>>>> simultaneously.
>>>>>>>
>>>>>>> Finally i am limited to having two instances only but as mentioned
>>>>>>> earlier, we need at least three Zookeeper instances for a proper
Zookeeper
>>>>>>> clustering.
>>>>>>>
>>>>>>> Is it still worthy to go and use Zookeeper or i can do simple
>>>>>>> clustering where each of MCF node is clustered using same DB.
Please
>>>>>>> suggest.
>>>>>>>
>>>>>>> Thanks for help.
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 27, 2014 at 11:15 AM, Graeme Seaton <lists@graemes.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  Hi Lalit,
>>>>>>>>
>>>>>>>> For production use, you will want to spin up your own ZK
cluster
>>>>>>>> using the instructions on the zookeeper site (as pointed
out earlier at
>>>>>>>> least 3 is recommended)....
>>>>>>>>
>>>>>>>> You then need to modify the properties.xml file in
>>>>>>>> multiprocess-zk-example to point to the list of Zookeeper
servers.  You
>>>>>>>> also need to modify properties-global.xml with the appropriate
global
>>>>>>>> settings i.e. logging levels, Postgresql database etc. and
then run
>>>>>>>> setglobalproperties.sh to register the settings in ZK.
>>>>>>>>
>>>>>>>> To test that is working, set up a crawl and then tail the
>>>>>>>> manifoldcf.log file on each of your nodes to check that they
are all
>>>>>>>> crawling in parallel.
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>>
>>>>>>>> Graeme
>>>>>>>>
>>>>>>>>
>>>>>>>> On 25/06/14 12:19, Karl Wright wrote:
>>>>>>>>
>>>>>>>>  Hi Lalit,
>>>>>>>>
>>>>>>>> Zookeeper does not use a database; it keeps its stuff in
the local
>>>>>>>> file system.  Each Zookeeper node has its own local data,
and everything
>>>>>>>> else is socket communication between them.
>>>>>>>>
>>>>>>>>  As for information: http://zookeeper.apache.org/
>>>>>>>>
>>>>>>>>  Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 25, 2014 at 6:56 AM, lalit jangra <
>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>  Thanks Karl,
>>>>>>>>>
>>>>>>>>> Apologies as i am not very familiar with Zookeeper and
trying to
>>>>>>>>> figure out on same.
>>>>>>>>>
>>>>>>>>> Is there any more documentation/pointers available for
same as
>>>>>>>>> that would be more helpful.
>>>>>>>>>
>>>>>>>>>  Also i have 2 tomcat servers in cluster, each having
MCF 1.5.1
>>>>>>>>> setup and configured to point to same PostGreSQL DB &
DB is backed up for
>>>>>>>>> failover. From your inputs, it seems that we need to
configure a separate
>>>>>>>>> standalone Zookeeper server which will act as Master
and both nodes in
>>>>>>>>> cluster will need to work as slaves and talk to standalone
Zookeeper master.
>>>>>>>>>
>>>>>>>>>  Also the Zookeeper server will have its own DB so either
we can
>>>>>>>>> host it separately or we can use same Postgres DB?
>>>>>>>>>
>>>>>>>>>  Regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 25, 2014 at 11:33 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>   Hi Lalit,
>>>>>>>>>>
>>>>>>>>>>  1. zookeeper is already spun into MCF.  in fact
you start a
>>>>>>>>>> zookeeper instance when you run the mcf zookeeper
example.  They recommend,
>>>>>>>>>> though, that for failover you have 3 instances, etc.
>>>>>>>>>>  2. Looks like the documentation is out of date and
something old
>>>>>>>>>> is left in there.
>>>>>>>>>>  3. Zookeeper is a client/server kind of arrangement.
 You need
>>>>>>>>>> at least ONE zookeeper server, and each cluster member
includes a zookeeper
>>>>>>>>>> client, which is configured to talk with ALL the
zookeeper server instances
>>>>>>>>>> you have.
>>>>>>>>>>  4.  There is ONE database instance; the instance
may be
>>>>>>>>>> supported by failover and redundant Postgresql, but
it appears as one
>>>>>>>>>> instance.  TO get failover from Postgres you need
the Enterprise Edition,
>>>>>>>>>> which costs money.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 25, 2014 at 4:47 AM, lalit jangra <
>>>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>     Thanks Karl,
>>>>>>>>>>>
>>>>>>>>>>>  That was helpful.
>>>>>>>>>>>
>>>>>>>>>>>  I am setting clustered setup on Tomcats as i
was following
>>>>>>>>>>> instructions @
>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Simplified+multi-process+model+using+ZooKeeper-based+synchronization
>>>>>>>>>>> and i need some suggestions here.
>>>>>>>>>>>
>>>>>>>>>>>  1. Do we need to download zookeeper and put
it in
>>>>>>>>>>> multiprocess-zk-example folder or it is already
spun into MCF and we are
>>>>>>>>>>> good to go?
>>>>>>>>>>>  2. It says all jars under *processes *should
be put into
>>>>>>>>>>> classpath but i can not see any *processes *folder
under MCF?
>>>>>>>>>>>  3. Do we need to setup Zookeeper on both nodes
or only at one
>>>>>>>>>>> node, i assume we need to do on both nodes ?
>>>>>>>>>>>  4. Do we also need to setup databases separately
on both nodes
>>>>>>>>>>> again. Also can we setup Zookeeper DB using same
PostGreSQL or it will use
>>>>>>>>>>> its own HSQL DB?
>>>>>>>>>>>
>>>>>>>>>>>  Finally how can i test that my Zookeeper is
setp and ready to
>>>>>>>>>>> roll?
>>>>>>>>>>>
>>>>>>>>>>>  Thanks for your help.
>>>>>>>>>>>
>>>>>>>>>>> Regards.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On Tue, Jun 24, 2014 at 1:56 PM, Karl Wright
<
>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  Hi Lalit,
>>>>>>>>>>>>  ZooKeeper is standard for cluster deployments
these days.  See
>>>>>>>>>>>> the multiprocess-zookeeper example for ideas
about how to deploy it.  It's
>>>>>>>>>>>> also important to read the how-to-build-and-deploy
page to understand the
>>>>>>>>>>>> example.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 24, 2014 at 8:04 AM, lalit jangra
<
>>>>>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>  I am planning to use MCF in cluster
mode. For same, i want to
>>>>>>>>>>>>> know if Zookeeper is of any help here?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  If yes, how can it be leveraged in distributed
MCF servers?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Lalit Jangra.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>>>>>>> Regards,
>>>>>>>>>>> Lalit Jangra.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Regards,
>>>>>>>>> Lalit Jangra.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Lalit Jangra.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Lalit Jangra.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Lalit Jangra.
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Lalit Jangra.
>>>
>>
>>
>>
>> --
>> Regards,
>> Lalit Jangra.
>>
>
>
>
> --
> Regards,
> Lalit Jangra.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message