lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gautam Lad" <gau...@rogers.com>
Subject Website and keeping data fresh
Date Tue, 12 Feb 2008 02:40:05 GMT
Hey all,

 

I recently moved our company's external website to use dotLucene, and so far
it's been great and is working flawlessly.  

 

I have several indices that I use to manage our website.  Since our company
is in the Book industry I have several indices that are used for various
parts of the page.

 

Eg. Our main catalog is searchable and so we have a "Book" index that can be
searched by Title, Description, Author, etc.

 

We also have an Author table that can be searched by First name, Last name,
bio, etc.

 

Finally we have a BookAuthor relationship table that is used when a Book is
searched, the BookAuthor is searched to find out if the Book's authors have
other books.

 

The indices are as:

 

Book (primary key: ISBN) - 160,000+ documents

Author (primary key: AuthorID) - 60, 000+ documents

BookAuthor (contains LinkID), 100, 000+ documents

 

 

So far things are working great.  The book index is about 500MB and is not a
big overhead on our system.

 

Now here's where the problem lies.  

 

To keep things fresh on the site, we have a nightly job that rebuilds entire
index and then copies the data over to the production index folder (it takes
about an hour to rebuild entire site and a min or two to copy things over).

 

However, there will be times when the information will need to be updated
almost live during the normal day-to-day hours.

 

Say for example a book's description has changed.  What I do is I delete the
document and then re-add it.  

 

Unfortunately deleting and re-adding it to the index takes a few minutes and
this is causing issues with information not being available when someone
tries to look on the site.

 

 

Here's the log from our background service that rebuild documents:

 

20080211 16:59:32 [Engine] [book] Deleting isbn(1554700310).  Status: 1

20080211 16:59:32 [Engine] [book] [00:00:00:000] Getting table count

20080211 16:59:34 [Engine] [book] [00:00:02:156] Rows loaded 1

20080211 16:59:34 [Engine] [book] [00:00:02:156] Getting table schema

20080211 16:59:34 [Engine] [book] [00:00:02:218] Getting data reader

20080211 16:59:36 [Engine] [book] [16:59:36:000] Index dump started

20080211 16:59:36 [Engine] [book] [00:00:00:078] Total indexed: 1

20080211 16:59:36 [Engine] [book] [00:00:00:078] Optimizing index

20080211 17:02:23 [Engine] [book] [00:02:46:917] Index finished

 

You can see from the moment it deleted the ISBN from the "book" index to
when it finally added it back, it took only 4 seconds.  But when the call to
Writer.Optimize() is called it takes almost 2-1/2 minutes to optimize the
index.

 

Is optimizing the index even necessary at this point?

 

Any help is greatly appreciated.

 

--

Gautam Lad

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message