Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@jackrabbit.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: "Seidel. Robert" <Robert.Seidel@aeb.de>
To: "users@jackrabbit.apache.org" <users@jackrabbit.apache.org>
Date: Wed, 17 Nov 2010 18:42:55 +0100
Subject: AW: Multiple instances of repository
Thread-Topic: Multiple instances of repository
Thread-Index: AcuFvgyiNnIQvrZyQymyVu5oksQ7PAAuG1eQAAHd9pA=
Message-ID: <C95151191D741844A776DE94864A68AA107B2B537E@S-HQMX7.pmbelz.de>
References: <2DD29CC841489C44A6E38A6CA5D4BB270D75C5F034@emcexc-02>
	<C95151191D741844A776DE94864A68AA107B2B50B9@S-HQMX7.pmbelz.de>
	<2DD29CC841489C44A6E38A6CA5D4BB270D75C5F048@emcexc-02>
	<C95151191D741844A776DE94864A68AA107B2B510A@S-HQMX7.pmbelz.de>
	<2DD29CC841489C44A6E38A6CA5D4BB270D75C5F062@emcexc-02>
	<C95151191D741844A776DE94864A68AA107B2B5131@S-HQMX7.pmbelz.de>
	<2DD29CC841489C44A6E38A6CA5D4BB270D75C5F06E@emcexc-02>
 <AANLkTinjBS4aSO1G5n9snGq5ZzfPX_Gzyd32Fo3_Qo52@mail.gmail.com>
 <2DD29CC841489C44A6E38A6CA5D4BB270D75C5F842@emcexc-02>
In-Reply-To: <2DD29CC841489C44A6E38A6CA5D4BB270D75C5F842@emcexc-02>
Accept-Language: de-DE
Content-Language: de-DE
acceptlanguage: de-DE
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Hi Nikhil,

why don't you just use one 24x7 server jvm hitting the jackrabbit repositor=
y? If one of the hundred jvms want something from the repository, they have=
 to make a web service call to your server instance, which gets the job don=
e.

Kindly regards, Robert

-----Urspr=FCngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
Gesendet: Mittwoch, 17. November 2010 18:06
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

I am really thankful for all the suggestions.
I am not an expert in architecting the applications and the answers are rea=
lly providing me lots of help.

Justin as you suggested, I think there is a need to change the architecture=
.
Let's say I restructure my application, let's call it app1, such that it's =
24X7 type of application.
It will wait for a job and may be some scheduler ( quartz may be) will prov=
ide it a job instance to run.
Now this application 'app1' can be run on two different machines (in a clus=
tered environment) and in that case these two jackrabbit repository instanc=
es should be configured as a cluster, right?
But I will also have a web-application that will also hit the repository in=
stance. Right now it just reads the content from repository but in future i=
t might write into the repository as well. This web application can be also=
 run on machine 1 and machine2.
So now on machine 1, I will have one web-application and one other 24X7 app=
lication and they both will be hitting the jackrabbit repository.
So I will have to run a cluster configuration on this machine1, because I w=
ill have two independent JVMs hitting on the same repository?
I really don't want to run  cluster nodes on a single machine, just so that=
 different JVMs can access the repository. That doesn't look correct. I am =
sure that will be better ways to solve this issue as well.

Any ideas will be of great help.

-Nikhil


-----Original Message-----
From: justinedelson@gmail.com [mailto:justinedelson@gmail.com] On Behalf Of=
 Justin Edelson
Sent: Wednesday, November 17, 2010 12:12 AM
To: users@jackrabbit.apache.org
Subject: Re: Multiple instances of repository

Nikhil-
I think you should rethink you're architecture. It really doesn't make
sense to be bringing repository instances up only for a 2-4 minute
job. Instead, you should think about using the Command pattern and
package your "applications" as executable jobs which can be run inside
a long-running VM against a local repository instance (i.e. making
in-process calls instead of RMI or DavEx).

This is where something like OSGi and Apache Sling can be *very*
helpful, but there are obviously other ways to add/remove jobs at
runtime. See, for example, Sling's Scheduler support:
http://sling.apache.org/site/scheduler-service-commons-scheduler.html

Justin

On Tue, Nov 16, 2010 at 5:16 AM,  <nikhil.agrawal@emeter.com> wrote:
> Thanks for your inputs, they are really helpful.
>
> Well, so does my application is not a good candidate to use jackrabbit.
>
> The other option, I had was to use jackrabbit in client-server mode. In t=
his case I will be accessing the repository from RMI. But in the jackrabbit=
 documents it has been mentioned that RMI is not optimized for performance =
and I should use embedded repository instance in my application code for be=
tter performance.
>
> I can remove the search functionality from these clusters, because the li=
fe span of these will be very short. The application will take 2-4 minutes =
to do its job and I don't think we really need search for these clusters.
>
> But my question is, should I really use the clustering feature. I mean cl=
uster nodes should normally have a longer life span. But here in this case =
the nodes will have very short life span 2-4 minutes.
> I am kind of finding it hard to use these short span applications as clus=
ter nodes.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 3:33 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> I don't know if it will work (setProperty), but you have another problem.=
 The Lucene search index is always saved in the file system. And afaik, eac=
h repository home needs its own index directories (so you have the index fi=
les for each cluster). If you make a new cluster, you have to wait for a lo=
ng time till the index is built, depending on the data in your repository (=
if you have tons of data, you have to wait a week or longer).
>
> The tables of the FS and PM will be shared between all cluster nodes - th=
at works.
>
> Kindly regards, Robert
>
> -----Urspr=FCngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 10:54
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Since there could be n number of instances. So I can't decide the cluster=
 id beforehand.
> Hence I have the following code that creates a cluster id at run time.
>
> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster=
_id"+System.nanoTime());
>
> Similarly the repositoryHome path is generated at run time.
>
> But do I also need separate tables for workspace file system? I have the =
following configuration for my workspace. Is it correct? The tables for the=
 workspace FS and PersistenceManager will be shared between all the nodes o=
r will these tables will be different?
>
> <?xml version=3D"1.0"?>
> <!DOCTYPE Repository
>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//E=
N"
>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
>
> <Repository>
>
>     <DataStore class=3D"org.apache.jackrabbit.core.data.db.DbDataStore">
>                <param name=3D"driver" value=3D"javax.naming.InitialContex=
t"/>
>                <param name=3D"url" value=3D"jdbc/amiDBDataSource"/>
>                <param name=3D"databaseType" value=3D"oracle"/>
>        <param name=3D"copyWhenReading" value=3D"true"/>
>        <param name=3D"tablePrefix" value=3D""/>
>        <param name=3D"schemaObjectPrefix" value=3D"J_R_DS_"/>
>        <param name=3D"schemaCheckEnabled" value=3D"false"/>
>    </DataStore>
>
>        <FileSystem class=3D"org.apache.jackrabbit.core.fs.db.OracleFileSy=
stem">
>                <param name=3D"driver" value=3D"javax.naming.InitialContex=
t"/>
>                <param name=3D"url" value=3D"jdbc/amiDBDataSource"/>
>                <!-- The following value must oracle for oracle server thi=
s is not the same as the database schema -->
>                <param name=3D"schema" value=3D"oracle"/>
>                <param name=3D"schemaObjectPrefix" value=3D"J_R_FS_"/>
>                <param name=3D"schemaCheckEnabled" value=3D"false"/>
>        </FileSystem>
>
>        <Security appName=3D"Jackrabbit">
>                <SecurityManager class=3D"repository.jcr.jackrabbit.EipSec=
urityManager" />
>                <AccessManager class=3D"org.apache.jackrabbit.core.securit=
y.SimpleAccessManager" />
>                <LoginModule class=3D"org.apache.jackrabbit.core.security.=
SimpleLoginModule">
>                        <param name=3D"principalProvider" value=3D"reposit=
ory.jcr.jackrabbit.EipPrincipalProvider" />
>                </LoginModule>
>        </Security>
>
>        <Workspaces rootPath=3D"${rep.home}/workspaces" defaultWorkspace=
=3D"eip" />
>
>        <Workspace name=3D"${wsp.name}">
>        <FileSystem class=3D"org.apache.jackrabbit.core.fs.db.OracleFileSy=
stem">
>                        <param name=3D"driver" value=3D"javax.naming.Initi=
alContext"/>
>                        <param name=3D"url" value=3D"jdbc/amiDBDataSource"=
/>
>                        <!-- The following value must oracle for oracle se=
rver this is not the same as the database schema -->
>                        <param name=3D"schema" value=3D"oracle"/>
>                        <param name=3D"schemaObjectPrefix" value=3D"J_FS_$=
{wsp.name}_"/>
>                        <param name=3D"schemaCheckEnabled" value=3D"false"=
/>
>                </FileSystem>
>                <PersistenceManager class=3D"org.apache.jackrabbit.core.pe=
rsistence.bundle.OraclePersistenceManager">
>                        <param name=3D"driver" value=3D"javax.naming.Initi=
alContext"/>
>                        <param name=3D"url" value=3D"jdbc/amiDBDataSource"=
/>
>                        <param name=3D"tableSpace" value=3D"" />
>                        <!-- The following value must oracle for oracle se=
rver this is not the same as the database schema -->
>                        <param name=3D"schema" value=3D"oracle" />
>                        <param name=3D"schemaObjectPrefix" value=3D"J_PM_$=
{wsp.name}_" />
>                        <param name=3D"externalBLOBs" value=3D"false" />
>                        <param name=3D"schemaCheckEnabled" value=3D"false"=
/>
>                </PersistenceManager>
>                <SearchIndex class=3D"org.apache.jackrabbit.core.query.luc=
ene.SearchIndex">
>            <param name=3D"path" value=3D"${wsp.home}/index"/>
>            <param name=3D"supportHighlighting" value=3D"true"/>
>        </SearchIndex>
>        </Workspace>
>
>        <Versioning rootPath=3D"${rep.home}/version">
>
>                <FileSystem class=3D"org.apache.jackrabbit.core.fs.db.Orac=
leFileSystem">
>                        <param name=3D"driver" value=3D"javax.naming.Initi=
alContext"/>
>                        <param name=3D"url" value=3D"jdbc/amiDBDataSource"=
/>
>                        <!-- The following value must oracle for oracle se=
rver this is not the same as the database schema -->
>                        <param name=3D"schema" value=3D"oracle"/>
>                        <param name=3D"schemaObjectPrefix" value=3D"J_V_FS=
_"/>
>                        <param name=3D"schemaCheckEnabled" value=3D"false"=
/>
>                </FileSystem>
>                <!-- Change to Oracle Class <PersistenceManager class=3D"o=
rg.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
>                <PersistenceManager class=3D"org.apache.jackrabbit.core.pe=
rsistence.bundle.OraclePersistenceManager">
>                        <param name=3D"driver" value=3D"javax.naming.Initi=
alContext"/>
>                        <param name=3D"url" value=3D"jdbc/amiDBDataSource"=
/>
>                        <param name=3D"tableSpace" value=3D"" />
>                        <!-- The following value must oracle for oracle se=
rver this is not the same as the database schema -->
>                        <param name=3D"schema" value=3D"oracle" />
>                        <param name=3D"schemaObjectPrefix" value=3D"J_V_PM=
_" />
>                        <param name=3D"externalBLOBs" value=3D"false" />
>                        <param name=3D"schemaCheckEnabled" value=3D"false"=
/>
>                </PersistenceManager>
>
>        </Versioning>
>
>    <SearchIndex class=3D"org.apache.jackrabbit.core.query.lucene.SearchIn=
dex">
>        <param name=3D"path" value=3D"${rep.home}/search/index"/>
>        <param name=3D"supportHighlighting" value=3D"true"/>
>    </SearchIndex>
>
>        <Cluster syncDelay=3D"2000">
>                <Journal class=3D"org.apache.jackrabbit.core.journal.Oracl=
eDatabaseJournal">
>                <param name=3D"revision" value=3D"${rep.home}/revision.log=
" />
>                        <param name=3D"driver" value=3D"javax.naming.Initi=
alContext"/>
>                        <param name=3D"url" value=3D"jdbc/amiDBDataSource"=
/>
>                        <param name=3D"schemaObjectPrefix" value=3D"J_R_" =
/>
>                        <param name=3D"databaseType" value=3D"oracle"/>
>                </Journal>
>        </Cluster>
>
> </Repository>
>
> Thanks,
> Nikhil
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 2:42 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> you need clustering, because all of your instances should access the same=
 repository.
>
> What you need is separate repository homes for each instance. In my use c=
ase I have an installation directory for each instance, so the repository h=
ome is located below this directory.
>
> You have to make sure, that each instance has also its own repository.xml=
 because you need to define different clusterIDs.
>
> And you have to define a cluster section in the repository.xml where the =
journal is located, which is necessary for synchronization:
>
>    <Cluster id=3D"node1" syncDelay=3D"5000">
>      <Journal class=3D"org.apache.jackrabbit.core.journal.OracleDatabaseJ=
ournal">
>        <param name=3D"driver" value=3D"javax.naming.InitialContext"/>
>        <param name=3D"url" value=3D"jdbc/amiDBDataSource"/>
>          ...
>      </Journal>
>    </Cluster>
>
> Kindly regards, Robert
>
> -----Urspr=FCngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 09:37
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Thanks for replying back. I will need little more help to understand the =
things completely.
> I will just elaborate a bit more on my usage scenario. I am also attachin=
g my repository.xml file with this mail. Please let me know if you want to =
know more about my environment.
>
> In my case, I want to keep all the data in one database and I want to use=
 jackrabbit as JCR over this database.
> I have the jackrabbit embedded in my application so the repository gets-u=
p as part of the application.
> Now this application reads some files from repository and also inserts so=
me data in repository.
> There could be two instances of the application app1 running on machine1 =
and app2 running on machine2.
> So my application instances are different and I can create multiple repos=
itory homes to avoid the locking problem but I still wants to insert the da=
ta from these applications in same database tables.
> So if all the application instances use the same repository configuration=
 file and specify their own repository home.
> Will that work in my case? Will there be any consistency issues?
>
> When you say separate data store and separate persistence managers, you m=
ean separate repository configuration file or separate database tables for =
data stores and persistence managers.
>
> My instances and the repositories operate separately from each other but =
they still want to share the data. The data inserted by one application ins=
tance should be visible to other instance. So they all should be inserting =
the data in same tables, that's what my understanding is.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 1:22 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> if you want to use clustering, you have to define a repository home for e=
ach cluster.
>
> Clustering is necessary, if you want to have the same data/indexes at all=
 cluster nodes - the key word is synchronization.
>
> If your instances and the repositories operate separately from each other=
, you don't need clustering. Separate repository homes, data stores and per=
sistence managers will do the job.
>
> Kindly regards, Robert
>
> -----Urspr=FCngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 08:33
> An: users@jackrabbit.apache.org
> Betreff: Multiple instances of repository
>
> Hi,
>
> I am using jackrabbit as JCR implementation in my project. I am running j=
ackrabbit with in my application in the same jvm.
> The application read the content from repository and also writes some con=
tent in repository.
> There could be multiple concurrent instances of my application running on=
 the same or different machines.
> I have a configuration file for jackrabbit and I have a single repository=
 home for jackrabbit.
> Now as soon as one instance of the application is up and running, I can't=
 run the other instance as the first instance creates a lock file in reposi=
tory home.
> After doing some search I came to know about running the jackrabbit in cl=
ustered mode.
> Now my question is even in this case I will have to specify a different r=
epository home for every run, right?
> That means I should form the repository home path at the run time because=
 at compile time I am not sure how many instance will be run.
> This is a standalone java application and theoretically n number of insta=
nce can be run.
> My question is when I have to specify a different repository path for eve=
ry run, then the jackrabbit will work even with out clustering?
> Because .lock file will be different for different runs as the repository=
 home is different.
> I know I am missing something here, please help me.
> I am attaching my conf file with this mail.
>
> Thanks,
> Nikhil
>
>