lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gautam Lad" <>
Subject Website and keeping data fresh
Date Tue, 12 Feb 2008 02:40:05 GMT
Hey all,


I recently moved our company's external website to use dotLucene, and so far
it's been great and is working flawlessly.  


I have several indices that I use to manage our website.  Since our company
is in the Book industry I have several indices that are used for various
parts of the page.


Eg. Our main catalog is searchable and so we have a "Book" index that can be
searched by Title, Description, Author, etc.


We also have an Author table that can be searched by First name, Last name,
bio, etc.


Finally we have a BookAuthor relationship table that is used when a Book is
searched, the BookAuthor is searched to find out if the Book's authors have
other books.


The indices are as:


Book (primary key: ISBN) - 160,000+ documents

Author (primary key: AuthorID) - 60, 000+ documents

BookAuthor (contains LinkID), 100, 000+ documents



So far things are working great.  The book index is about 500MB and is not a
big overhead on our system.


Now here's where the problem lies.  


To keep things fresh on the site, we have a nightly job that rebuilds entire
index and then copies the data over to the production index folder (it takes
about an hour to rebuild entire site and a min or two to copy things over).


However, there will be times when the information will need to be updated
almost live during the normal day-to-day hours.


Say for example a book's description has changed.  What I do is I delete the
document and then re-add it.  


Unfortunately deleting and re-adding it to the index takes a few minutes and
this is causing issues with information not being available when someone
tries to look on the site.



Here's the log from our background service that rebuild documents:


20080211 16:59:32 [Engine] [book] Deleting isbn(1554700310).  Status: 1

20080211 16:59:32 [Engine] [book] [00:00:00:000] Getting table count

20080211 16:59:34 [Engine] [book] [00:00:02:156] Rows loaded 1

20080211 16:59:34 [Engine] [book] [00:00:02:156] Getting table schema

20080211 16:59:34 [Engine] [book] [00:00:02:218] Getting data reader

20080211 16:59:36 [Engine] [book] [16:59:36:000] Index dump started

20080211 16:59:36 [Engine] [book] [00:00:00:078] Total indexed: 1

20080211 16:59:36 [Engine] [book] [00:00:00:078] Optimizing index

20080211 17:02:23 [Engine] [book] [00:02:46:917] Index finished


You can see from the moment it deleted the ISBN from the "book" index to
when it finally added it back, it took only 4 seconds.  But when the call to
Writer.Optimize() is called it takes almost 2-1/2 minutes to optimize the


Is optimizing the index even necessary at this point?


Any help is greatly appreciated.



Gautam Lad


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message