lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kurt Mackey <k...@mubble.net>
Subject RE: Website and keeping data fresh
Date Tue, 12 Feb 2008 03:10:13 GMT
Nope.  For that few writes, I can't see how you'd ever need to optimize during the day.  You
might run a few tests to find out how many writes cause search performance to degrade, but
I suspect it's a lot. :)

Optimizing is slow because it essentially writes all the index contents to a new index file.

-Kurt

-----Original Message-----
From: Gautam Lad [mailto:gautam@rogers.com]
Sent: Monday, February 11, 2008 8:40 PM
To: lucene-net-user@incubator.apache.org
Subject: Website and keeping data fresh

Hey all,



I recently moved our company's external website to use dotLucene, and so far
it's been great and is working flawlessly.



I have several indices that I use to manage our website.  Since our company
is in the Book industry I have several indices that are used for various
parts of the page.



Eg. Our main catalog is searchable and so we have a "Book" index that can be
searched by Title, Description, Author, etc.



We also have an Author table that can be searched by First name, Last name,
bio, etc.



Finally we have a BookAuthor relationship table that is used when a Book is
searched, the BookAuthor is searched to find out if the Book's authors have
other books.



The indices are as:



Book (primary key: ISBN) - 160,000+ documents

Author (primary key: AuthorID) - 60, 000+ documents

BookAuthor (contains LinkID), 100, 000+ documents





So far things are working great.  The book index is about 500MB and is not a
big overhead on our system.



Now here's where the problem lies.



To keep things fresh on the site, we have a nightly job that rebuilds entire
index and then copies the data over to the production index folder (it takes
about an hour to rebuild entire site and a min or two to copy things over).



However, there will be times when the information will need to be updated
almost live during the normal day-to-day hours.



Say for example a book's description has changed.  What I do is I delete the
document and then re-add it.



Unfortunately deleting and re-adding it to the index takes a few minutes and
this is causing issues with information not being available when someone
tries to look on the site.





Here's the log from our background service that rebuild documents:



20080211 16:59:32 [Engine] [book] Deleting isbn(1554700310).  Status: 1

20080211 16:59:32 [Engine] [book] [00:00:00:000] Getting table count

20080211 16:59:34 [Engine] [book] [00:00:02:156] Rows loaded 1

20080211 16:59:34 [Engine] [book] [00:00:02:156] Getting table schema

20080211 16:59:34 [Engine] [book] [00:00:02:218] Getting data reader

20080211 16:59:36 [Engine] [book] [16:59:36:000] Index dump started

20080211 16:59:36 [Engine] [book] [00:00:00:078] Total indexed: 1

20080211 16:59:36 [Engine] [book] [00:00:00:078] Optimizing index

20080211 17:02:23 [Engine] [book] [00:02:46:917] Index finished



You can see from the moment it deleted the ISBN from the "book" index to
when it finally added it back, it took only 4 seconds.  But when the call to
Writer.Optimize() is called it takes almost 2-1/2 minutes to optimize the
index.



Is optimizing the index even necessary at this point?



Any help is greatly appreciated.



--

Gautam Lad




Mime
View raw message