manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lalit jangra <lalit.j.jan...@gmail.com>
Subject Apache Manifoldcf High Availability requirements
Date Wed, 16 Apr 2014 11:40:17 GMT
Hi,



I am using MCF for crawling multiple sources having around 10-15 million
documents initially & similar volumes added each year and I want it to be
clustered in high availability mode. For same, I have some questions in
mind.

1.       I am using PostgreSQL DB with tomcat 7 hosting MCF.

2.       How much DB size should be considered for such scenarios as we
have documents in magnitude of TBs.

3.       Does PostgreSQL run on VMs.

4.       What would be the ideal clustering approach: having two different
MCF servers managed by Zookeeper with each having its own  DB which are in
sync with each other  managed by a set of two load balancers or two
different MCF instances having a common clustered(active/passive) DB
instance managed by set of two load balancers.

5.       If I use first approach : having two different MCF servers managed
by Zookeeper with each having its own  DB which are in sync with each other
 managed by a set of two load balancers – I need to sync both DB instances
having extra tasks added.

6.       If I use second approach : or two different MCF instances having a
common clustered(active/passive) DB instance managed by set of two load
balancers – I have a set of clustered DBs.

7.       Which of these approaches would yield better results?

8.       Is there any definitive guide for high availability of MCF?

Regards,

Lalit.

Mime
View raw message