jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Bhatia" <vik...@gmail.com>
Subject Re: How are developers using jackrabbit
Date Wed, 01 Aug 2007 15:22:41 GMT
Hello All,

I read somewhere that most of the dev folk are enjoying their summer
vacations :)

Thanks for your detailed replies so far.My content model primarily
deals with binary data with a lot of supporting nodes.

On 7/31/07, Ard Schrijvers <a.schrijvers@hippo.nl> wrote:
> Hello Bertrand,
> >
> > On 7/31/07, Ard Schrijvers <a.schrijvers@hippo.nl> wrote:
> > > Regarding your usecase, having around 36.000.000 documents
> > after one year .in one
> > > single ws with terabytes of data...so 100.000.000 docs
> > within three years...Well, I
> > > think you at least have to tune some settings :-)...
> >
> > Just to make sure there's no misunderstanding, the original post says
> > "nodes", not "documents".
>
> Yes you are right! I must have misunderstood since he is talking about "pushing 300-500
nodes a minute" so I understood he meant pushing docs in JR :-)
>

The reason I said nodes, is because we have different kinds of nodes
in the system, while most of them are documents, there are nodes that
are supporting, such as permissions or auditing etc. So for one
document added to JR, 3-4 nodes might be modified, and each document
has about 7-8 properties which could increase as we see usage
statistics.

We have tried to stay away from references, since we saw that this
could slow the system down tremendously and clog up the DB.

> >
> > So that's 36 million nodes a year, or 100 million after three years.
> > If it was documents, it might be many more nodes than that.
> >
> > Although I haven't run those tests myself, I've talked with people
> > doing tests with, IIRC, 150 million nodes, and such quantities are
> > also regularly mentioned in Lucene tests
>
> Yes I agree, but in these cases you really have to understand how to tune and configure
each seperate component, because for example, if you have a just invalidated indexReader,
and you are doing a search on a common word with a sort on title, or some rangequery, you
might run into problems with 150 million nodes.
>
> >, so I don't think this is
> > necessarily a problem. But of course, it depends on how nodes are
> > structured and on what's indexed.
>
> Indexing seems to me pretty important when having 150 million nodes. Actually ATM I am
sorting out the JackRabbit 1.4 release planned IndexingConfigurationImpl possibilities, which
look very promising to me (though OTOH, people must know how to configure the indexing properly,
and this might be a bit harsh in the beginning because you really have to know the content
modelling structure AFAICS).
>
> But as I misunderstood the requirements regarding nodes, and you know people who have
succesful tests with 150 million nodes...well, then I will stick to my remark that you need
to know to tune some configuration parameters :-)
>
> Regards Ard
>
> >
> > -Bertrand
> >
>

Again, mine might be a common case or a unique case depending on how
you choose to look at it.

We have been using JR for a while and wondering whether there is a
secret sauce somewhere and hence this email, trying to gauge from the
community on their experiences.

Thanks.

V.

Mime
View raw message