lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gautam Lad" <gau...@rogers.com>
Subject RE: Website and keeping data fresh
Date Thu, 14 Feb 2008 04:26:12 GMT
Hey,
Well I have about 2 dozen separate indices, with the Book table being the
lagest.

The reason for taking over an hour, is the data is in SQL and is dumped out
via an SQLDataReader and I can't explain why but our MS-SQL server is just
horrible (due to our situation, we have only one primary SQL server that
does everything - reporting, in-house programs that use SQL, replication
from our host database, general queries, etc.)

However, the moment I start building the actual index, it takes maybe 30min
total to do.

I think I am going to leave the optimization out except on the nightly build
and leave it un-optimized through out the day and see how it goes.


Thanks,
--

Gautam Lad

-----Original Message-----
From: Digy [mailto:digydigy@gmail.com] 
Sent: February 12, 2008 1:45 PM
To: lucene-net-user@incubator.apache.org
Subject: RE: Website and keeping data fresh

How about using online indexing and optimizing at nights?

I don't think that even if you never optimize the index, the search
performance will degrade noticably.
But an -optional- nightly optimization would be good too.

Note: Indexing with multiple threads can be a problem in Lucene.Net 2.0. If
so, you can use v2.1 which can be obtained from svn.

DIGY.


-----Original Message-----
From: Nic Wise [mailto:Nic.Wise@bbc.com] 
Sent: February 12, 2008 6:07 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Website and keeping data fresh

If it's any help, we (when I was at Quest) were removing items and
adding them on an ongoing basis. We had indexes with 25+million items,
around 15GB+ from memory. We'd add around 100K items a day, and some
items were added more than once (which means remove then add). We had
very good performance once we changed the MaxDocuments (to 100K, was
2.4billion / maxint) and the MergeFactor - merging really large blocks
(I don't know the official term - segments?) made performance lousy, but
you don't NEED to keep them all in one file.

Does that make sense?

How much stuff are you putting into the book index? 500meg sounds about
right, but an hour sounds a little high?


Mime
View raw message