manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lalit jangra <lalit.j.jan...@gmail.com>
Subject Re: Zookeeper in Apache ManifoldCF
Date Fri, 04 Jul 2014 13:04:59 GMT
Hello Karl,

I have got my cluster working with each node in cluster having three
zookeeper nodes so total six nodes. I have connected both MCF1 & MCF2 to
this cluster by
using below steps.


   1. Start ZooKeeper (using the *runzookeeper[.sh|.bat]* script)
   2. Initialize the ManifoldCF shared configuration data (using
   *setglobalproperties[.sh|.bat]*)
   3. Start the database (using *start-database[.sh|.bat]*)
   4. Initialize the database (using *initialize[.sh|.bat]*)
   5. Start the agents process (using *start-agents[.sh|.bat]*, and
   optionally *start-agents-2[.sh|.bat]*)
   6. Modify the Tomcat startup script, or use the Tomcat service
   administration client, to set a Java "-Dorg.apache.manifoldcf.configfile"
   switch to point to the example's *properties.xml* file.
   7. Start Tomcat.
   8. Deploy and start the mcf-crawler-ui, mcf-authority-service, and
   mcf-api-service web applications, preferably using the Tomcat
   administration client.

I just want to ask if we need to start zookeeper as in step 1 as we have
already a  cluster of zookeeper servers up and running?

Also i want to confirm if we need to update tomcat as in step 6 as i did
not do that but i am not getting any error as such? Are there any
implications for not using this?

Finally i would ask for a little help as first i did my setup using
multiprocess-file-exmaple by initializing DB but then for zookeeper i moved
to   multiprocess-zk-exmaple.

Is it safe to use in production?

Regards.




On Thu, Jul 3, 2014 at 10:48 PM, lalit jangra <lalit.j.jangra@gmail.com>
wrote:

> Thanks Karl,
>
> I am having one cluster with two MCF instances pointing to one single DB.
>
> Can you please elaborate a bit more?
>
> regards.
>
>
>
>
> On Thu, Jul 3, 2014 at 10:19 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>>
>> Hi lalit,
>>
>> When data is pushed into the database that mcf uses but the mcf instance
>> is not doing the pushing, then caches everywhere will not be properly
>> invalidated.  It may be more appropriate to have only one cluster with two
>> members of each type (agents process, mcf UI, etc), if that would be
>> acceptable.
>>
>>
>> Karl
>>
>> Sent from my Windows Phone
>> ------------------------------
>> From: lalit jangra
>> Sent: 7/3/2014 1:23 PM
>> To: Karl Wright
>>
>> Subject: Re: Zookeeper in Apache ManifoldCF
>>
>>  Hello Karl,
>>
>> I have a set of two MCF servers each having its own tomcat server but
>> pointing to same Postgres DB.
>>
>> I have also configured set of three zookeeper servers on each node of
>> cluster, started them, configured properties.xml & properties-global.xml on
>> both nodes. Finally i started zookeeper's start-agents.sh on both nodes.
>>
>> While trying to run ./zkCli.sh -server localhost:2181 on both machines, i
>> am getting different outputs. Is it normal or i am missing something.
>>
>> Node1.
>>
>> [zk: localhost:2181(CONNECTED) 2] ls /
>>
>> [org.apache.manifoldcf.service-AGENT,
>> org.apache.manifoldcf.servicelock-AGENT,
>> org.apache.manifoldcf.configuration,
>> org.apache.manifoldcf.serviceactive-AGENT-A, zookeeper]
>>
>>
>> Node2.
>>
>> [zk: localhost:2181(CONNECTED) 1] ls /
>>
>> [org.apache.manifoldcf.locks-statslock-reindex-jobqueue,
>> org.apache.manifoldcf.locks-_Cache_OUTPUTCONNECTION_Solr,
>> org.apache.manifoldcf.service-AGENT,
>> org.apache.manifoldcf.service-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent,
>> org.apache.manifoldcf.resources-stats-reindex-jobqueue,
>> org.apache.manifoldcf.serviceanon-_OUTPUTCONNECTORPOOL_Solr,
>> org.apache.manifoldcf.locks-_Cache_JOBSTATUSES,
>> org.apache.manifoldcf.locks-statslock-analyze-jobqueue,
>> org.apache.manifoldcf.servicelock-AGENT,
>> org.apache.manifoldcf.locks-_REPR_TRACKER_LOCK_,
>> org.apache.manifoldcf.configuration,
>> org.apache.manifoldcf.servicelock-_OUTPUTCONNECTORPOOL_Solr,
>> org.apache.manifoldcf.locks-_STUFFERTHREAD_LOCK,
>> org.apache.manifoldcf.service-_OUTPUTCONNECTORPOOL_Solr,
>> org.apache.manifoldcf.resources-_REPR_MINDEPTH_,
>> org.apache.manifoldcf.resources-_STUFFERTHREAD_LASTTIME,
>> org.apache.manifoldcf.resources-stats-analyze-jobqueue,
>> org.apache.manifoldcf.locks-_IDFACTORY_,
>> org.apache.manifoldcf.locks-_JOBRESET_,
>> org.apache.manifoldcf.servicelock-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent,
>> org.apache.manifoldcf.resources-cache-JOBSTATUSES,
>> org.apache.manifoldcf.locks-_JOBSTOP_,
>> org.apache.manifoldcf.locks-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr,
>> zookeeper, org.apache.manifoldcf.resources-_IDFACTORY_,
>> org.apache.manifoldcf.locks-_Cache_JOB_1404323519962,
>> org.apache.manifoldcf.locks-_Cache_DB-mcfdb-TBL-outputconnectors,
>> org.apache.manifoldcf.locks-_JOBRESUME_]
>>
>>
>> Also in clustered setup, i noticed one strange behavior.
>>
>> If i created a job on say MCF1 in clustered setup, it is created but not
>> replicated to MCF2 node. I need to restart MCF2 node to get it replicated
>> there. Is it OK?
>>
>> Please suggest.
>>
>> Regards.
>>
>>
>> On Wed, Jul 2, 2014 at 10:49 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi lalit,
>>>
>>> Each agents process in a cluster needs its own Id. Please look carefully
>>> at the multiprocess zookeeper example for details how to do that.  If you
>>> didn't intend for there to be multiple agents processes in one cluster, you
>>> did something wrong, because that is what you have.
>>>
>>>
>>> Karl
>>>
>>> Sent from my Windows Phone
>>> ------------------------------
>>> From: lalit jangra
>>> Sent: 7/2/2014 2:11 PM
>>> To: Karl Wright
>>> Cc: user@manifoldcf.apache.org
>>>
>>> Subject: Re: Zookeeper in Apache ManifoldCF
>>>
>>>  Hello,
>>>
>>> I have configured 3 zookeeper instances on port 2181, 2182, 2183 on my
>>> server and in mcf/dist/mulitprocess-zk-example i have configured all three
>>> servers as comma separated list.
>>>
>>> Now i have started all three zookeeper instances and i could see all
>>> three running. Next i tried with a crawl job but in manifoldcf.logs, i can
>>> see below error.
>>>
>>> ERROR 2014-07-02 19:07:15,716 (Agents thread) - Exception tossed:
>>> Service '' of type
>>> 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already active
>>>
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service '' of
>>> type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already
>>> active
>>>
>>>         at
>>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:156)
>>>
>>>         at
>>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:120)
>>>
>>>         at
>>> org.apache.manifoldcf.core.lockmanager.LockManager.registerServiceBeginServiceActivity(LockManager.java:69)
>>>
>>>         at
>>> org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270)
>>>
>>>         at
>>> org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208)
>>>
>>>
>>> How can i validate that these errors are not related to zookeeper or
>>> not? Also how to know if MCF is integrated with zookeeper.
>>>
>>>
>>> Regards.
>>>
>>>
>>>
>>> On Tue, Jul 1, 2014 at 3:19 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Lalit,
>>>>
>>>> I presumed in my recommendation that your "active" and "passive"
>>>> manifoldcf instances were using the same PostgreSQL server, but were using
>>>> different database instances within it.  That is the only way it could
>>>> reasonable work.
>>>>
>>>> Any time you have a Zookeeper cluster, they recommend you have three
>>>> instances.  Effectively you are setting up two ManifoldCF clusters: an
>>>> "active" one, and a "passive" one.  Each one has its own database instance
>>>> within PostgreSQL, and each one (if it is multiprocess) should have 3
>>>> zookeeper instances.
>>>>
>>>> I hope this is clear.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Jul 1, 2014 at 9:54 AM, lalit jangra <lalit.j.jangra@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Karl,
>>>>>
>>>>> I have a little variation here and this is about having both MCF nodes
>>>>> in Active/Active nodes pointing to same DB, so still Zookeeper is required?
>>>>>
>>>>> Also does it mean by " two sets of three zookeeper machines",  i need
>>>>> to setup three zookeepers onto each node so total 6 zookepeer node here
>>>>> working on both machine in same  ensamble?
>>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>>> On Mon, Jun 30, 2014 at 6:50 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Lalit,
>>>>>>
>>>>>> You can keep things really simple by having both active and passive
>>>>>> mcf instances run each as a single process, either under jetty or
using the
>>>>>> combined war under tomcat.  If that is not acceptable, you would
need two
>>>>>> sets of three zookeeper machines, one set for each instance.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> Sent from my Windows Phone
>>>>>> ------------------------------
>>>>>> From: lalit jangra
>>>>>> Sent: 6/30/2014 12:19 PM
>>>>>> To: user@manifoldcf.apache.org
>>>>>> Subject: Re: Zookeeper in Apache ManifoldCF
>>>>>>
>>>>>>  Thanks Karl & Graeme,
>>>>>>
>>>>>> Let me elaborate my scenario and what i am trying to achieve.
>>>>>>
>>>>>> I have two servers each running MCF 1.5.1 individually. But both
of
>>>>>> them are backed by same PostGreSQL DB so both of MCF applications
are
>>>>>> pointing to same DB at any point of time, without having their own
>>>>>> dedicated DBs. Next, primary/active DB instance is  backed up with
>>>>>> periodical backups from active to passive instance.
>>>>>>
>>>>>> Only one DB instance will be active at any time, with other DB
>>>>>> instance acting as active standby. In case of breakdown of primary/active
>>>>>> instance, passive/secondary will take over and becomes primary/active
>>>>>> instance handling all DB transactions, thus making primary as new
secondary
>>>>>> DB instance.
>>>>>>
>>>>>> Similarly i have two solr 4.6 instances which act in active/passive
>>>>>> mode with periodic backup of active/primary to passive/secondary
with
>>>>>> active standby and failover.
>>>>>>
>>>>>> So my intention of clustering is high availability of system with
>>>>>> failover but i will not use both of MCF instances parallely or
>>>>>> simultaneously.
>>>>>>
>>>>>> Finally i am limited to having two instances only but as mentioned
>>>>>> earlier, we need at least three Zookeeper instances for a proper
Zookeeper
>>>>>> clustering.
>>>>>>
>>>>>> Is it still worthy to go and use Zookeeper or i can do simple
>>>>>> clustering where each of MCF node is clustered using same DB. Please
>>>>>> suggest.
>>>>>>
>>>>>> Thanks for help.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 27, 2014 at 11:15 AM, Graeme Seaton <lists@graemes.com>
>>>>>> wrote:
>>>>>>
>>>>>>>  Hi Lalit,
>>>>>>>
>>>>>>> For production use, you will want to spin up your own ZK cluster
>>>>>>> using the instructions on the zookeeper site (as pointed out
earlier at
>>>>>>> least 3 is recommended)....
>>>>>>>
>>>>>>> You then need to modify the properties.xml file in
>>>>>>> multiprocess-zk-example to point to the list of Zookeeper servers.
 You
>>>>>>> also need to modify properties-global.xml with the appropriate
global
>>>>>>> settings i.e. logging levels, Postgresql database etc. and then
run
>>>>>>> setglobalproperties.sh to register the settings in ZK.
>>>>>>>
>>>>>>> To test that is working, set up a crawl and then tail the
>>>>>>> manifoldcf.log file on each of your nodes to check that they
are all
>>>>>>> crawling in parallel.
>>>>>>>
>>>>>>> HTH,
>>>>>>>
>>>>>>> Graeme
>>>>>>>
>>>>>>>
>>>>>>> On 25/06/14 12:19, Karl Wright wrote:
>>>>>>>
>>>>>>>  Hi Lalit,
>>>>>>>
>>>>>>> Zookeeper does not use a database; it keeps its stuff in the
local
>>>>>>> file system.  Each Zookeeper node has its own local data, and
everything
>>>>>>> else is socket communication between them.
>>>>>>>
>>>>>>>  As for information: http://zookeeper.apache.org/
>>>>>>>
>>>>>>>  Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 25, 2014 at 6:56 AM, lalit jangra <
>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>
>>>>>>>>  Thanks Karl,
>>>>>>>>
>>>>>>>> Apologies as i am not very familiar with Zookeeper and trying
to
>>>>>>>> figure out on same.
>>>>>>>>
>>>>>>>> Is there any more documentation/pointers available for same
as that
>>>>>>>> would be more helpful.
>>>>>>>>
>>>>>>>>  Also i have 2 tomcat servers in cluster, each having MCF
1.5.1
>>>>>>>> setup and configured to point to same PostGreSQL DB &
DB is backed up for
>>>>>>>> failover. From your inputs, it seems that we need to configure
a separate
>>>>>>>> standalone Zookeeper server which will act as Master and
both nodes in
>>>>>>>> cluster will need to work as slaves and talk to standalone
Zookeeper master.
>>>>>>>>
>>>>>>>>  Also the Zookeeper server will have its own DB so either
we can
>>>>>>>> host it separately or we can use same Postgres DB?
>>>>>>>>
>>>>>>>>  Regards.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 25, 2014 at 11:33 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>   Hi Lalit,
>>>>>>>>>
>>>>>>>>>  1. zookeeper is already spun into MCF.  in fact you
start a
>>>>>>>>> zookeeper instance when you run the mcf zookeeper example.
 They recommend,
>>>>>>>>> though, that for failover you have 3 instances, etc.
>>>>>>>>>  2. Looks like the documentation is out of date and something
old
>>>>>>>>> is left in there.
>>>>>>>>>  3. Zookeeper is a client/server kind of arrangement.
 You need at
>>>>>>>>> least ONE zookeeper server, and each cluster member includes
a zookeeper
>>>>>>>>> client, which is configured to talk with ALL the zookeeper
server instances
>>>>>>>>> you have.
>>>>>>>>>  4.  There is ONE database instance; the instance may
be supported
>>>>>>>>> by failover and redundant Postgresql, but it appears
as one instance.  TO
>>>>>>>>> get failover from Postgres you need the Enterprise Edition,
which costs
>>>>>>>>> money.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 25, 2014 at 4:47 AM, lalit jangra <
>>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>     Thanks Karl,
>>>>>>>>>>
>>>>>>>>>>  That was helpful.
>>>>>>>>>>
>>>>>>>>>>  I am setting clustered setup on Tomcats as i was
following
>>>>>>>>>> instructions @
>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Simplified+multi-process+model+using+ZooKeeper-based+synchronization
>>>>>>>>>> and i need some suggestions here.
>>>>>>>>>>
>>>>>>>>>>  1. Do we need to download zookeeper and put it in
>>>>>>>>>> multiprocess-zk-example folder or it is already spun
into MCF and we are
>>>>>>>>>> good to go?
>>>>>>>>>>  2. It says all jars under *processes *should be
put into
>>>>>>>>>> classpath but i can not see any *processes *folder
under MCF?
>>>>>>>>>>  3. Do we need to setup Zookeeper on both nodes or
only at one
>>>>>>>>>> node, i assume we need to do on both nodes ?
>>>>>>>>>>  4. Do we also need to setup databases separately
on both nodes
>>>>>>>>>> again. Also can we setup Zookeeper DB using same
PostGreSQL or it will use
>>>>>>>>>> its own HSQL DB?
>>>>>>>>>>
>>>>>>>>>>  Finally how can i test that my Zookeeper is setp
and ready to
>>>>>>>>>> roll?
>>>>>>>>>>
>>>>>>>>>>  Thanks for your help.
>>>>>>>>>>
>>>>>>>>>> Regards.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Tue, Jun 24, 2014 at 1:56 PM, Karl Wright <daddywri@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>>  Hi Lalit,
>>>>>>>>>>>  ZooKeeper is standard for cluster deployments
these days.  See
>>>>>>>>>>> the multiprocess-zookeeper example for ideas
about how to deploy it.  It's
>>>>>>>>>>> also important to read the how-to-build-and-deploy
page to understand the
>>>>>>>>>>> example.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 24, 2014 at 8:04 AM, lalit jangra
<
>>>>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>  I am planning to use MCF in cluster mode.
For same, i want to
>>>>>>>>>>>> know if Zookeeper is of any help here?
>>>>>>>>>>>>
>>>>>>>>>>>>  If yes, how can it be leveraged in distributed
MCF servers?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Lalit Jangra.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>>> Regards,
>>>>>>>>>> Lalit Jangra.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> Regards,
>>>>>>>> Lalit Jangra.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Lalit Jangra.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Lalit Jangra.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Lalit Jangra.
>>>
>>
>>
>>
>> --
>> Regards,
>> Lalit Jangra.
>>
>
>
>
> --
> Regards,
> Lalit Jangra.
>



-- 
Regards,
Lalit Jangra.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message