lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Lucene in Action e-book now available!
Date Sat, 11 Dec 2004 00:32:50 GMT

On Dec 10, 2004, at 5:29 PM, Jonathan Hager wrote:
> Congratulations on the book.  I ordered my copy the other day via
> regular post and am eagerly awaiting it.  It looks like it will make
> lucene available to a much wider audience.

Thank you.  Let me know off-line when you receive your copy, so I can 
get an idea of when it actually ships.

> Based on the table of contents, I wanted to toss out a couple of ideas
> for your next book or articles.

Suggestions more than welcome!

> 1. I didn't see any examples of indexing a database table.  Although
> it was mentioned in Chapter 1.  At the company I am currently
> consulting at, we index the data from the database because its cleaner
> than indexing the web.

It is true that LIA does not cover any details of indexing a database.  
The idea is that Lucene indexes text, and how you get text into Lucene 
is not usually generalizable - or at least not worth the complexity of 
trying to do so.

The jGuru case study makes some strong points about indexing your 
content rather than your website.  I've found that words of wisdom 
often are more potent than specific technical details that do not 

>   This discussion should include why you would
> want to use lucene to index a database table, rather than just using
> the database indexes.  (The top reasons we choose to use Lucene
> instead of just database indexes are: It allows stem word recognition;
> It allows fuzzy searching; It ranks the results based on how good the
> match is; It contains a parser that will parse natural language
> queries; It has better Analyzers)

These are all great points.  I will add these to my notes for future 

> 2. This one is a cookbook idea, I think it would be possible to index
> the access log of web server.  Than when a user views product X the
> searcher could search for a other products that were viewed by people
> that also looked at product X.  In this way you can create basic
> "cross-selling" opportunities.  This feature is a big seller to
> managers for commercial search offerings.

While this is not exactly what you mention, I did write a section on 
the new term vector feature allowing it to be used for for finding 
similar documents.

> 3. A lot of search applications being built using lucene are web
> applications.  I didn't see any reference to the two different
> strategies for paging a hit list.  The two strategies are repeating
> the search and caching a search.  An example of this would be good.
> [I know that I have seen this online, its just nice to have a
> reference in book form]

The Searching chapter has a brief section discussing this topic, with 
the advice that re-searching is the best first approach.

> Please don't take this as criticism.

No worries.  Your points are all very valid.

To address the inevitable follow-up, I am setting up a blog site where 
we'll post items that need further clarification, correction, or 
completely new things that we did not cover.  This will take me a few 
weeks to get time to set up properly.  I will announce it here when it 
is available.

> I look forward to reading the book and appreciate your 14+ months of
> hard work to create a concise but valuable book for Lucene.

Thanks again for your feedback.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message