jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Edelson <jus...@justinedelson.com>
Subject Re: Multiple instances of repository
Date Wed, 17 Nov 2010 20:50:29 GMT
On Wed, Nov 17, 2010 at 12:05 PM,  <nikhil.agrawal@emeter.com> wrote:
> So I will have to run a cluster configuration on this machine1, because I will have two
independent JVMs hitting on
> the same repository?
Yes.

> I really don't want to run  cluster nodes on a single machine, just so that different
JVMs can access the repository.
> That doesn't look correct. I am sure that will be better ways to solve this issue as
well.
Although I suspect this isn't typical, there's nothing wrong with
this. Multiple JVMs = cluster nodes; doesn't really matter if they're
on the same physical machine or multiple physical machines.

Justin

>
> Any ideas will be of great help.
>
> -Nikhil
>
>
> -----Original Message-----
> From: justinedelson@gmail.com [mailto:justinedelson@gmail.com] On Behalf Of Justin Edelson
> Sent: Wednesday, November 17, 2010 12:12 AM
> To: users@jackrabbit.apache.org
> Subject: Re: Multiple instances of repository
>
> Nikhil-
> I think you should rethink you're architecture. It really doesn't make
> sense to be bringing repository instances up only for a 2-4 minute
> job. Instead, you should think about using the Command pattern and
> package your "applications" as executable jobs which can be run inside
> a long-running VM against a local repository instance (i.e. making
> in-process calls instead of RMI or DavEx).
>
> This is where something like OSGi and Apache Sling can be *very*
> helpful, but there are obviously other ways to add/remove jobs at
> runtime. See, for example, Sling's Scheduler support:
> http://sling.apache.org/site/scheduler-service-commons-scheduler.html
>
> Justin
>
> On Tue, Nov 16, 2010 at 5:16 AM,  <nikhil.agrawal@emeter.com> wrote:
>> Thanks for your inputs, they are really helpful.
>>
>> Well, so does my application is not a good candidate to use jackrabbit.
>>
>> The other option, I had was to use jackrabbit in client-server mode. In this case
I will be accessing the repository from RMI. But in the jackrabbit documents it has been mentioned
that RMI is not optimized for performance and I should use embedded repository instance in
my application code for better performance.
>>
>> I can remove the search functionality from these clusters, because the life span
of these will be very short. The application will take 2-4 minutes to do its job and I don't
think we really need search for these clusters.
>>
>> But my question is, should I really use the clustering feature. I mean cluster nodes
should normally have a longer life span. But here in this case the nodes will have very short
life span 2-4 minutes.
>> I am kind of finding it hard to use these short span applications as cluster nodes.
>>
>> Thanks,
>> Nikhil
>>
>> -----Original Message-----
>> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
>> Sent: Tuesday, November 16, 2010 3:33 PM
>> To: users@jackrabbit.apache.org
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> I don't know if it will work (setProperty), but you have another problem. The Lucene
search index is always saved in the file system. And afaik, each repository home needs its
own index directories (so you have the index files for each cluster). If you make a new cluster,
you have to wait for a long time till the index is built, depending on the data in your repository
(if you have tons of data, you have to wait a week or longer).
>>
>> The tables of the FS and PM will be shared between all cluster nodes - that works.
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
>> Gesendet: Dienstag, 16. November 2010 10:54
>> An: users@jackrabbit.apache.org
>> Betreff: RE: Multiple instances of repository
>>
>> Since there could be n number of instances. So I can't decide the cluster id beforehand.
>> Hence I have the following code that creates a cluster id at run time.
>>
>> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());
>>
>> Similarly the repositoryHome path is generated at run time.
>>
>> But do I also need separate tables for workspace file system? I have the following
configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager
will be shared between all the nodes or will these tables will be different?
>>
>> <?xml version="1.0"?>
>> <!DOCTYPE Repository
>>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
>>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
>>
>> <Repository>
>>
>>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>>                <param name="driver" value="javax.naming.InitialContext"/>
>>                <param name="url" value="jdbc/amiDBDataSource"/>
>>                <param name="databaseType" value="oracle"/>
>>        <param name="copyWhenReading" value="true"/>
>>        <param name="tablePrefix" value=""/>
>>        <param name="schemaObjectPrefix" value="J_R_DS_"/>
>>        <param name="schemaCheckEnabled" value="false"/>
>>    </DataStore>
>>
>>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                <param name="driver" value="javax.naming.InitialContext"/>
>>                <param name="url" value="jdbc/amiDBDataSource"/>
>>                <!-- The following value must oracle for oracle server
this is not the same as the database schema -->
>>                <param name="schema" value="oracle"/>
>>                <param name="schemaObjectPrefix" value="J_R_FS_"/>
>>                <param name="schemaCheckEnabled" value="false"/>
>>        </FileSystem>
>>
>>        <Security appName="Jackrabbit">
>>                <SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager"
/>
>>                <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager"
/>
>>                <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>>                        <param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider"
/>
>>                </LoginModule>
>>        </Security>
>>
>>        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip"
/>
>>
>>        <Workspace name="${wsp.name}">
>>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <!-- The following value must oracle for oracle
server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle"/>
>>                        <param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </FileSystem>
>>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="tableSpace" value="" />
>>                        <!-- The following value must oracle for oracle
server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle" />
>>                        <param name="schemaObjectPrefix" value="J_PM_${wsp.name}_"
/>
>>                        <param name="externalBLOBs" value="false" />
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </PersistenceManager>
>>                <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>            <param name="path" value="${wsp.home}/index"/>
>>            <param name="supportHighlighting" value="true"/>
>>        </SearchIndex>
>>        </Workspace>
>>
>>        <Versioning rootPath="${rep.home}/version">
>>
>>                <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <!-- The following value must oracle for oracle
server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle"/>
>>                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </FileSystem>
>>                <!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager">
-->
>>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="tableSpace" value="" />
>>                        <!-- The following value must oracle for oracle
server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle" />
>>                        <param name="schemaObjectPrefix" value="J_V_PM_"
/>
>>                        <param name="externalBLOBs" value="false" />
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </PersistenceManager>
>>
>>        </Versioning>
>>
>>    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>        <param name="path" value="${rep.home}/search/index"/>
>>        <param name="supportHighlighting" value="true"/>
>>    </SearchIndex>
>>
>>        <Cluster syncDelay="2000">
>>                <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>>                <param name="revision" value="${rep.home}/revision.log"
/>
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="schemaObjectPrefix" value="J_R_"
/>
>>                        <param name="databaseType" value="oracle"/>
>>                </Journal>
>>        </Cluster>
>>
>> </Repository>
>>
>> Thanks,
>> Nikhil
>> -----Original Message-----
>> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
>> Sent: Tuesday, November 16, 2010 2:42 PM
>> To: users@jackrabbit.apache.org
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> you need clustering, because all of your instances should access the same repository.
>>
>> What you need is separate repository homes for each instance. In my use case I have
an installation directory for each instance, so the repository home is located below this
directory.
>>
>> You have to make sure, that each instance has also its own repository.xml because
you need to define different clusterIDs.
>>
>> And you have to define a cluster section in the repository.xml where the journal
is located, which is necessary for synchronization:
>>
>>    <Cluster id="node1" syncDelay="5000">
>>      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>>        <param name="driver" value="javax.naming.InitialContext"/>
>>        <param name="url" value="jdbc/amiDBDataSource"/>
>>          ...
>>      </Journal>
>>    </Cluster>
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
>> Gesendet: Dienstag, 16. November 2010 09:37
>> An: users@jackrabbit.apache.org
>> Betreff: RE: Multiple instances of repository
>>
>> Thanks for replying back. I will need little more help to understand the things completely.
>> I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml
file with this mail. Please let me know if you want to know more about my environment.
>>
>> In my case, I want to keep all the data in one database and I want to use jackrabbit
as JCR over this database.
>> I have the jackrabbit embedded in my application so the repository gets-up as part
of the application.
>> Now this application reads some files from repository and also inserts some data
in repository.
>> There could be two instances of the application app1 running on machine1 and app2
running on machine2.
>> So my application instances are different and I can create multiple repository homes
to avoid the locking problem but I still wants to insert the data from these applications
in same database tables.
>> So if all the application instances use the same repository configuration file and
specify their own repository home.
>> Will that work in my case? Will there be any consistency issues?
>>
>> When you say separate data store and separate persistence managers, you mean separate
repository configuration file or separate database tables for data stores and persistence
managers.
>>
>> My instances and the repositories operate separately from each other but they still
want to share the data. The data inserted by one application instance should be visible to
other instance. So they all should be inserting the data in same tables, that's what my understanding
is.
>>
>> Thanks,
>> Nikhil
>>
>> -----Original Message-----
>> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
>> Sent: Tuesday, November 16, 2010 1:22 PM
>> To: users@jackrabbit.apache.org
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> if you want to use clustering, you have to define a repository home for each cluster.
>>
>> Clustering is necessary, if you want to have the same data/indexes at all cluster
nodes - the key word is synchronization.
>>
>> If your instances and the repositories operate separately from each other, you don't
need clustering. Separate repository homes, data stores and persistence managers will do the
job.
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
>> Gesendet: Dienstag, 16. November 2010 08:33
>> An: users@jackrabbit.apache.org
>> Betreff: Multiple instances of repository
>>
>> Hi,
>>
>> I am using jackrabbit as JCR implementation in my project. I am running jackrabbit
with in my application in the same jvm.
>> The application read the content from repository and also writes some content in
repository.
>> There could be multiple concurrent instances of my application running on the same
or different machines.
>> I have a configuration file for jackrabbit and I have a single repository home for
jackrabbit.
>> Now as soon as one instance of the application is up and running, I can't run the
other instance as the first instance creates a lock file in repository home.
>> After doing some search I came to know about running the jackrabbit in clustered
mode.
>> Now my question is even in this case I will have to specify a different repository
home for every run, right?
>> That means I should form the repository home path at the run time because at compile
time I am not sure how many instance will be run.
>> This is a standalone java application and theoretically n number of instance can
be run.
>> My question is when I have to specify a different repository path for every run,
then the jackrabbit will work even with out clustering?
>> Because .lock file will be different for different runs as the repository home is
different.
>> I know I am missing something here, please help me.
>> I am attaching my conf file with this mail.
>>
>> Thanks,
>> Nikhil
>>
>>
>

Mime
View raw message