jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Nuescheler" <da...@day.com>
Subject Re: Jackrabbit = Kick Ass Tool (was: Jackrabbit = Big Trouble??)
Date Mon, 30 Jul 2007 09:14:29 GMT
Hi Bruce,

thanks for your comment.

> I am not fired by index problems. -)
> I just want to everybody realize it is very critical issue to back up your repository.
> Currently, the solution is:
> 1) Backup DB data.
> 2) Backup your file system and you can delete all indexes of them.
> However, it is still a bug that JackRabbit v1.3 can not rebuild everything from DB, in
> case your hard driver dies with all your repository file system.
Shouldn't that be solved by the DBFileSystem.
http://yukatan.fi/2007/1.4/org/apache/jackrabbit/core/fs/db/DbFileSystem.html

This allows you to store everything that is necessary for a complete restore
in the DB, which means your DB backup is the only thing (beyond the
repository.xml) that you need to restore a complete JR instance.

> My concerns are two:
> 1) Performance of navigation of Nodes which relates cache manager resizing
I appreciate the performance issue. I am still not convinced that this
is related
with the cache manager resizing...

> 2) Logic backup repository using JCR export/import API.
I agree that it would be desirable to have a built-in backup/restore
mechanism on a higher level.

The JCR export/import is probably not the right layer,
since it only covers the content in a single workspace and has no
means to address things like nodetypes, versions or the
namespace registry.
And I think your most pressing issue should be addressed
by the DBFileSystem.

regards,
david

> -----Original Message-----
> From: bdelacretaz@gmail.com [mailto:bdelacretaz@gmail.com] On Behalf Of Bertrand Delacretaz
> Sent: Friday, July 27, 2007 3:15 AM
> To: users@jackrabbit.apache.org
> Subject: Jackrabbit = Kick Ass Tool (was: Jackrabbit = Big Trouble??)
>
> Hi,
>
> I hate to play grumpy old man once again, but the recent trend towards
> Loud Subjects That Catch Peoples Attention does not really help the
> discussion, so let's rename this thread ;-)
>
> Bruce, if I read your message correctly, it looks like you have three
> problems with Jackrabbit:
>
> 1) Cache Manager resizes seem to slow your app down
> 2) You're going to be fired because you lost your index (or Jackrabbit did)
> 3) You're not sure about which application pattern/content model to use
>
> So let's please tackle these one at a time, ideally in separate
> threads so that people can contribute efficiently to the discussion.
>
> Sorry if I'm being a bit harsh, but IMHO you started it with the
> choice of your message's subject ;-)
> -Bertrand
>
>
> On 7/27/07, Bruce Li <bli@tirawireless.com> wrote:
> > I have been in this Jackrabbit Community for a couple of months since I joined repository
project two months ago.
> >
> >
> >
> > First, I respect and appreciate all hard works contributed in current JackRabbit
project and definitely I am sure a lot of developers benefit from this project. There are
some people contribute their JackRabbit working experience like David Nuescheler, who collects
"7 DR Rules", which is precious since current lack of document of JackRabbit, and they are
"real" working experiences.
> >
> >
> >
> > However, I also heard some negative voice from this community like "JackRabbit is
dead (for us)" from Frédéric Esnault. I suffer some troubles from JackRabbit and it seems
foundational problems. I would like to share all my experience with you, and any feedback
or good suggestion is definitely what I want.
> >
> >
> >
> > Since these troubles are "big" troubles for enterprise use of JackRabbit 1.3, let's
discuss it from beginning.
> >
> >
> >
> > Question 1:
> >
> > Why do you select JackRabbit rather than Database as your repository solution?
> >
> >
> >
> > There are a lot of answers for this question and it seems that everybody who joins
this community has already known the answers (It may be formal document which was approved
by your CTO).  However, my opinion, this is the basic question really need to be discussed
here.
> >
> >
> >
> > To answer this question, some technical key words to support Jackrabbit may be "JCR
API", "Lucene Search Engine" and so on. However, as the user of JackRabbit, I would like to
list the two key concerns why I select JackRabbit as repository solution from Product Point
of View:
> >
> >
> >
> > 1.      Quick and effective data search/fetch from volume content repository
> > 2.      Build-in content version/revision control without extra code
> >
> >
> >
> > Now let me describe the big troubles I met in my use:
> >
> > 1.      Quick and effective data search or fetch from volume content repository
> >
> >
> >
> > Experience: There are not many data on my repository which contains hundreds of
two major object nodes, each node (object) contains less than 20 properties (fields), including
the other 5 child nodes (nested small objects) and one of two major nodes(object) has one
binary data (up to 1 megabyte). Unfortunately, the performance is not acceptable when I navigate
nodes of the major nodes. The main problem is the build-in Cache Manager of JackRabbit resizes
which costs uncertain time, which result the operation very slow sometimes.  It is not easy
to read those codes when debugging Jackrabbit for performance tuning because there is no document
about the logic behind the index resizing.
> >
> >
> >
> > 2.      Content version/revision control
> >
> > Experience: This function works well on Jackrabbit v1.3. The main problem is that
all revision (except base revision) of node are lost when export/import data from one repository
to another repository. I am discussing this issue because it concerns the repository backup.
> >
> >
> >
> > I just found in JackRabbit v1.3, there is no way to backup repository using DB as
persistence manager. I mean that there is no way to re-index based on data on DB. The following
is my case:
> >
> >
> >
> > In one repository server, the index (in file system) is corrupt which causes all
search failure. However, all data (in DB) is still alive, where you can iterate all of them.
After clean the whole repository file system (most of them are index information), Jackrabbit
can not correctly re-build index based on the data on DB. If it happens on production repository,
it means: "My God, I am going to be fired". As I know, Jackrabbit v1.1 can successfully re-index
(creating totally new repository index (file system) based on DB data).
> >
> >
> >
> > As the alternative solution to backup repository, I try to export/import all nodes
from repository to another repository using JCR Export API (exportSystemView). The good news
is that JackRabbot v1.3 successfully builds index (the whole file system) during the importing
process; the bad news is that it lost all revision of all versioning nodes. Can you image
how frustrate I am when I realize there is no way to backup repository based on DB data?
> >
> >
> >
> > I just got the answer for the re-index issue for Jackrabbit v1.3: You CAN NOT delete
all file system. Only delete all indexes but keep the other folders. Jackrabbit can re-index
successfully when it starts up.
> >
> >
> >
> > Question 2:
> >
> > How can developer correctly use Jackrabbit (JCR) as their repository solution?
> >
> >
> >
> > The expert of jackrabbit may see that I use object to describe node and you may
think it is not the pattern you are using Jackrabbit. So the question is raised as "Which
is the best practices (pattern) to use Jackrabbit (JCR) as repository solution."
> >
> >
> >
> > From this community, I see a lot of developers use Jackrabbit by fetching contents
by path. It means that they do not need treat node as object, instead, they put content on
repository as asset, which can be easily and effectively retrieved by a given path. This pattern
exactly meets the truth of "The simplicity is the best".
> >
> >
> >
> > My use of Jackrabbit is based on the business requirement, which need to navigate
most of nodes and reference nodes, check child nodes and properties to find the proper content
by a couple of business rules. I would like to say that all performance issues are raised
by nodes iteration process. Even more, I have created generic classes using java reflect package
for bi-directory mapping between nodes and objects. For performance improvement, the mapping
supports generic child nodes lazy loading. However, it seems all these jobs do not solve the
performance problem although they sound pretty "professional".  You may ask me: if you have
such business requirement, why not go to DB and build the full relationship for your business
model? J2EE developers all know how powerful java-db world is: the mature ORM tool (e.g. Hibernate),
transaction management, batch data fetching, performance tuning and so on. However, my question
is: "Is there any good pattern in current jackrabbit to effectively handle data fetching with
week relationship?"
> >
> >
> >
> > Now it is time to say some words to the jackrabbit developers and contributors what
I really want to say for the whole community:
> >
> >
> >
> > My begs:
> >
> > Guide, document and sample code is the king for any open source. How frustrating
for Jackrabbit developers find the incorrect pattern is applied by users on their projects.
On the other hand, how frustrating for JackRabbit users can not find the good pattern to follow,
which can save their bunch of time. From product point of view, the search by XPath or XQuery
or SQL is not foundational issue. The foundational issue is one effective search means covers
most of important requirements from real world and the document can be found in jackrabbit
web site.
> >
> >
> >
> >
> >
> > I do believe Jackrabbit is qualified project and I really hope all "best features"
are documented, demoed and used by the whole community.
> >
> >
> >
> > Thanks
> >
> >
> >
> > Bruce
>

Mime
View raw message