jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: How are developers using jackrabbit
Date Tue, 31 Jul 2007 21:04:49 GMT
Hello Bertrand,
> On 7/31/07, Ard Schrijvers <a.schrijvers@hippo.nl> wrote:
> > Regarding your usecase, having around 36.000.000 documents 
> after one year .in one
> > single ws with terabytes of data...so 100.000.000 docs 
> within three years...Well, I
> > think you at least have to tune some settings :-)...
> Just to make sure there's no misunderstanding, the original post says
> "nodes", not "documents".

Yes you are right! I must have misunderstood since he is talking about "pushing 300-500 nodes
a minute" so I understood he meant pushing docs in JR :-) 

> So that's 36 million nodes a year, or 100 million after three years.
> If it was documents, it might be many more nodes than that.
> Although I haven't run those tests myself, I've talked with people
> doing tests with, IIRC, 150 million nodes, and such quantities are
> also regularly mentioned in Lucene tests

Yes I agree, but in these cases you really have to understand how to tune and configure each
seperate component, because for example, if you have a just invalidated indexReader, and you
are doing a search on a common word with a sort on title, or some rangequery, you might run
into problems with 150 million nodes. 

>, so I don't think this is
> necessarily a problem. But of course, it depends on how nodes are
> structured and on what's indexed.

Indexing seems to me pretty important when having 150 million nodes. Actually ATM I am sorting
out the JackRabbit 1.4 release planned IndexingConfigurationImpl possibilities, which look
very promising to me (though OTOH, people must know how to configure the indexing properly,
and this might be a bit harsh in the beginning because you really have to know the content
modelling structure AFAICS). 

But as I misunderstood the requirements regarding nodes, and you know people who have succesful
tests with 150 million nodes...well, then I will stick to my remark that you need to know
to tune some configuration parameters :-) 

Regards Ard

> -Bertrand
View raw message