jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <nikhil.agra...@emeter.com>
Subject RE: Multiple instances of repository
Date Wed, 17 Nov 2010 17:05:33 GMT
I am really thankful for all the suggestions.
I am not an expert in architecting the applications and the answers are really providing me
lots of help.

Justin as you suggested, I think there is a need to change the architecture.
Let's say I restructure my application, let's call it app1, such that it's 24X7 type of application.
It will wait for a job and may be some scheduler ( quartz may be) will provide it a job instance
to run.
Now this application 'app1' can be run on two different machines (in a clustered environment)
and in that case these two jackrabbit repository instances should be configured as a cluster,
right?
But I will also have a web-application that will also hit the repository instance. Right now
it just reads the content from repository but in future it might write into the repository
as well. This web application can be also run on machine 1 and machine2.
So now on machine 1, I will have one web-application and one other 24X7 application and they
both will be hitting the jackrabbit repository.
So I will have to run a cluster configuration on this machine1, because I will have two independent
JVMs hitting on the same repository?
I really don't want to run  cluster nodes on a single machine, just so that different JVMs
can access the repository. That doesn't look correct. I am sure that will be better ways to
solve this issue as well.

Any ideas will be of great help.

-Nikhil


-----Original Message-----
From: justinedelson@gmail.com [mailto:justinedelson@gmail.com] On Behalf Of Justin Edelson
Sent: Wednesday, November 17, 2010 12:12 AM
To: users@jackrabbit.apache.org
Subject: Re: Multiple instances of repository

Nikhil-
I think you should rethink you're architecture. It really doesn't make
sense to be bringing repository instances up only for a 2-4 minute
job. Instead, you should think about using the Command pattern and
package your "applications" as executable jobs which can be run inside
a long-running VM against a local repository instance (i.e. making
in-process calls instead of RMI or DavEx).

This is where something like OSGi and Apache Sling can be *very*
helpful, but there are obviously other ways to add/remove jobs at
runtime. See, for example, Sling's Scheduler support:
http://sling.apache.org/site/scheduler-service-commons-scheduler.html

Justin

On Tue, Nov 16, 2010 at 5:16 AM,  <nikhil.agrawal@emeter.com> wrote:
> Thanks for your inputs, they are really helpful.
>
> Well, so does my application is not a good candidate to use jackrabbit.
>
> The other option, I had was to use jackrabbit in client-server mode. In this case I will
be accessing the repository from RMI. But in the jackrabbit documents it has been mentioned
that RMI is not optimized for performance and I should use embedded repository instance in
my application code for better performance.
>
> I can remove the search functionality from these clusters, because the life span of these
will be very short. The application will take 2-4 minutes to do its job and I don't think
we really need search for these clusters.
>
> But my question is, should I really use the clustering feature. I mean cluster nodes
should normally have a longer life span. But here in this case the nodes will have very short
life span 2-4 minutes.
> I am kind of finding it hard to use these short span applications as cluster nodes.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 3:33 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> I don't know if it will work (setProperty), but you have another problem. The Lucene
search index is always saved in the file system. And afaik, each repository home needs its
own index directories (so you have the index files for each cluster). If you make a new cluster,
you have to wait for a long time till the index is built, depending on the data in your repository
(if you have tons of data, you have to wait a week or longer).
>
> The tables of the FS and PM will be shared between all cluster nodes - that works.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 10:54
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Since there could be n number of instances. So I can't decide the cluster id beforehand.
> Hence I have the following code that creates a cluster id at run time.
>
> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());
>
> Similarly the repositoryHome path is generated at run time.
>
> But do I also need separate tables for workspace file system? I have the following configuration
for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will
be shared between all the nodes or will these tables will be different?
>
> <?xml version="1.0"?>
> <!DOCTYPE Repository
>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
>
> <Repository>
>
>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <param name="databaseType" value="oracle"/>
>        <param name="copyWhenReading" value="true"/>
>        <param name="tablePrefix" value=""/>
>        <param name="schemaObjectPrefix" value="J_R_DS_"/>
>        <param name="schemaCheckEnabled" value="false"/>
>    </DataStore>
>
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <!-- The following value must oracle for oracle server this is not
the same as the database schema -->
>                <param name="schema" value="oracle"/>
>                <param name="schemaObjectPrefix" value="J_R_FS_"/>
>                <param name="schemaCheckEnabled" value="false"/>
>        </FileSystem>
>
>        <Security appName="Jackrabbit">
>                <SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager"
/>
>                <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager"
/>
>                <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>                        <param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider"
/>
>                </LoginModule>
>        </Security>
>
>        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
>
>        <Workspace name="${wsp.name}">
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server this
is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server this
is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_PM_${wsp.name}_"
/>
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>                <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>            <param name="path" value="${wsp.home}/index"/>
>            <param name="supportHighlighting" value="true"/>
>        </SearchIndex>
>        </Workspace>
>
>        <Versioning rootPath="${rep.home}/version">
>
>                <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server this
is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager">
-->
>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server this
is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_V_PM_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>
>        </Versioning>
>
>    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>        <param name="path" value="${rep.home}/search/index"/>
>        <param name="supportHighlighting" value="true"/>
>    </SearchIndex>
>
>        <Cluster syncDelay="2000">
>                <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>                <param name="revision" value="${rep.home}/revision.log" />
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="schemaObjectPrefix" value="J_R_" />
>                        <param name="databaseType" value="oracle"/>
>                </Journal>
>        </Cluster>
>
> </Repository>
>
> Thanks,
> Nikhil
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 2:42 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> you need clustering, because all of your instances should access the same repository.
>
> What you need is separate repository homes for each instance. In my use case I have an
installation directory for each instance, so the repository home is located below this directory.
>
> You have to make sure, that each instance has also its own repository.xml because you
need to define different clusterIDs.
>
> And you have to define a cluster section in the repository.xml where the journal is located,
which is necessary for synchronization:
>
>    <Cluster id="node1" syncDelay="5000">
>      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>        <param name="driver" value="javax.naming.InitialContext"/>
>        <param name="url" value="jdbc/amiDBDataSource"/>
>          ...
>      </Journal>
>    </Cluster>
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 09:37
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Thanks for replying back. I will need little more help to understand the things completely.
> I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml
file with this mail. Please let me know if you want to know more about my environment.
>
> In my case, I want to keep all the data in one database and I want to use jackrabbit
as JCR over this database.
> I have the jackrabbit embedded in my application so the repository gets-up as part of
the application.
> Now this application reads some files from repository and also inserts some data in repository.
> There could be two instances of the application app1 running on machine1 and app2 running
on machine2.
> So my application instances are different and I can create multiple repository homes
to avoid the locking problem but I still wants to insert the data from these applications
in same database tables.
> So if all the application instances use the same repository configuration file and specify
their own repository home.
> Will that work in my case? Will there be any consistency issues?
>
> When you say separate data store and separate persistence managers, you mean separate
repository configuration file or separate database tables for data stores and persistence
managers.
>
> My instances and the repositories operate separately from each other but they still want
to share the data. The data inserted by one application instance should be visible to other
instance. So they all should be inserting the data in same tables, that's what my understanding
is.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 1:22 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> if you want to use clustering, you have to define a repository home for each cluster.
>
> Clustering is necessary, if you want to have the same data/indexes at all cluster nodes
- the key word is synchronization.
>
> If your instances and the repositories operate separately from each other, you don't
need clustering. Separate repository homes, data stores and persistence managers will do the
job.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 08:33
> An: users@jackrabbit.apache.org
> Betreff: Multiple instances of repository
>
> Hi,
>
> I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with
in my application in the same jvm.
> The application read the content from repository and also writes some content in repository.
> There could be multiple concurrent instances of my application running on the same or
different machines.
> I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
> Now as soon as one instance of the application is up and running, I can't run the other
instance as the first instance creates a lock file in repository home.
> After doing some search I came to know about running the jackrabbit in clustered mode.
> Now my question is even in this case I will have to specify a different repository home
for every run, right?
> That means I should form the repository home path at the run time because at compile
time I am not sure how many instance will be run.
> This is a standalone java application and theoretically n number of instance can be run.
> My question is when I have to specify a different repository path for every run, then
the jackrabbit will work even with out clustering?
> Because .lock file will be different for different runs as the repository home is different.
> I know I am missing something here, please help me.
> I am attaching my conf file with this mail.
>
> Thanks,
> Nikhil
>
>

Mime
View raw message