jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Raboch <wrab...@ingen.at>
Subject Re: Scalability/Clustering
Date Sat, 09 Jul 2005 15:24:21 GMT
Hi David, Hi Serge,

> cool. i am currently trying to get at least a common .NET port
> of the API put together in jackrabbit (just like markus did it for PHP)
> are you interested in helping with that?
> i think a .NET client using the WebDAV JCR remoting could 
> be a very interesting option.
> http://www.day.com/jsr170/server/JCR_Webdav_Protocol.zip

yes I am interested... is there still some code or how do we begin?

filesystem vs database:

I see the advantages of both ways but think that a database storage is

- easier to sell to a customer because they trust in databases since
   many decades now

- backup: there are many solutions out there and the databases are
   already backuped at customer sites - so no extra effort

- more scalable: databases have been tuned for large amounts of data
  (especialy small entities. we all now that BLOBs kill a DBMS system)

I would be fine with a filesystem storage, if replication (full 
transactional over the cluster) is available. But this has to be totaly 
transparent to the JCR client.

I understand the deployment with more JCR repositories each holding a 
subset of data for a specific user group and some shared, replicated 
data that does not  change frequently. But to support this, you have to 
group users which is  extremly hard especially in our planed application.

There would be a hybrid solution too: store structure info and 
attributes to a DBMS and BLOBs to the filesystem. The project "daisy" is 
  just using this aproach. (http://new.cocoondev.org/daisy/index.html)

>>Are there any efforts to make jackrabbit clustered for a load sharing
>>scenario (no session failover at repository layer) ?
> i think there are a couple of caches that need to be made 
> clusterable (or at least pluggable) in the jackrabbit core for 
> that to happen efficiently, it has to be done very carefully, 
> but it should not be to much work i think.
> this is definitely on the roadmap and investigations into that
> direction have already happend.

is there any information around about these investigations?

> From what I have seen making the cache implementation pluggeable 
> would be a good necessary first step. It then becomes possible to 
 > use OSCache, JBossTreeCache or Tangosol Coherence that all handle
 > clustered caches.

I have been thinking about the same aproach. I like the plugin concept 
because you can better tweak jackrabbit to the current situation.

>>After reading a lot of code, I think following changes should do it:
>>- extending ObservationManager to send and receive Events to
>>  and from other nodes
> maybe... personally i would like to have that functionality closer
> to the core, to keep things as transactional as possible across
> the cluster.

Its ok - the closer to the core the more transparent the solution is for 
other parts of jackrabbit. What would you recommend?

>>- implementing/extending an ORM Layer (Hibernate with shared caching for
>>  performance). The persistence implementation should be aware of the
>>  node types and allow a type specific mapping to tables. So we can map
>>  nodetypes with many instances to own tables while maintaining
>>  flexibility for new "simple" nodetypes.
> i think that you may get a better performance impact by implementing
> the shared cache on higher layer in the jackrabbit architecture.
> on a completely different note, some people probably also like to map 
> nodetypes to tables for "aesthetic" reasons...

> One quick note about the current ORM implementation. The current 
> implementation that I've worked on with Jackrabbit can be improved. 
 > Feel free to have a look and contribute ! But what David is saying
 > is true : for performance, the higher you can cache, the better !

I am glad that you already invested so much time for a base I can work 
on. I like your solution but would prefer making the mapping 
configurable on a per NodeType base. I just started working on this.

>>What else should be synchronized between the nodes?
>>Did I overlook something?
> i think this list sounds like a good start...

Can someone explain me the decison making process in the project? How do 
we find a suggestion for these modifications?



  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message