lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gautam Lad" <>
Subject RE: Website and keeping data fresh
Date Tue, 12 Feb 2008 04:21:12 GMT
Very good to know.  

Since not a lot of documents update during the course of the day and since
we already re-build the index at night, I doubt it would hurt performance as
you say :)


Gautam Lad

-----Original Message-----
From: Kurt Mackey [] 
Sent: February 11, 2008 10:10 PM
Subject: RE: Website and keeping data fresh

Nope.  For that few writes, I can't see how you'd ever need to optimize
during the day.  You might run a few tests to find out how many writes cause
search performance to degrade, but I suspect it's a lot. :)

Optimizing is slow because it essentially writes all the index contents to a
new index file.


-----Original Message-----
From: Gautam Lad []
Sent: Monday, February 11, 2008 8:40 PM
Subject: Website and keeping data fresh

Hey all,

I recently moved our company's external website to use dotLucene, and so far
it's been great and is working flawlessly.

I have several indices that I use to manage our website.  Since our company
is in the Book industry I have several indices that are used for various
parts of the page.

Eg. Our main catalog is searchable and so we have a "Book" index that can be
searched by Title, Description, Author, etc.

We also have an Author table that can be searched by First name, Last name,
bio, etc.

Finally we have a BookAuthor relationship table that is used when a Book is
searched, the BookAuthor is searched to find out if the Book's authors have
other books.

The indices are as:

Book (primary key: ISBN) - 160,000+ documents

Author (primary key: AuthorID) - 60, 000+ documents

BookAuthor (contains LinkID), 100, 000+ documents

So far things are working great.  The book index is about 500MB and is not a
big overhead on our system.

Now here's where the problem lies.

To keep things fresh on the site, we have a nightly job that rebuilds entire
index and then copies the data over to the production index folder (it takes
about an hour to rebuild entire site and a min or two to copy things over).

However, there will be times when the information will need to be updated
almost live during the normal day-to-day hours.

Say for example a book's description has changed.  What I do is I delete the
document and then re-add it.

Unfortunately deleting and re-adding it to the index takes a few minutes and
this is causing issues with information not being available when someone
tries to look on the site.

Here's the log from our background service that rebuild documents:

20080211 16:59:32 [Engine] [book] Deleting isbn(1554700310).  Status: 1

20080211 16:59:32 [Engine] [book] [00:00:00:000] Getting table count

20080211 16:59:34 [Engine] [book] [00:00:02:156] Rows loaded 1

20080211 16:59:34 [Engine] [book] [00:00:02:156] Getting table schema

20080211 16:59:34 [Engine] [book] [00:00:02:218] Getting data reader

20080211 16:59:36 [Engine] [book] [16:59:36:000] Index dump started

20080211 16:59:36 [Engine] [book] [00:00:00:078] Total indexed: 1

20080211 16:59:36 [Engine] [book] [00:00:00:078] Optimizing index

20080211 17:02:23 [Engine] [book] [00:02:46:917] Index finished

You can see from the moment it deleted the ISBN from the "book" index to
when it finally added it back, it took only 4 seconds.  But when the call to
Writer.Optimize() is called it takes almost 2-1/2 minutes to optimize the

Is optimizing the index even necessary at this point?

Any help is greatly appreciated.


Gautam Lad

View raw message