lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Lucene Seaches VS. Relational database Queries
Date Tue, 11 Apr 2006 22:41:37 GMT

1) An inverted full text index is not a replacment for a relational
database.

2) many people think they need a relational database, when all they really
need is a well designed full text index.

To get to some of your specific questions...

: them in one field).  One of the problems I see would be with values that
: over lap (Example, name where one name is Jason Bateman, and one is Jason
: Bateman Black, and it would be hard to replicate the Discrete Search for

they way field values are "analyzed" is extremely configurable -- down to
the individual field level.  Which means that while you can have an actor
field where you can do loose text searching for "bateman" and get back
movies staring "Jason Bateman" and "Jason Bateman Black" (and even Guido
Batemans" if you use stemming) you can also have another field using a
KeywordAnalyzer such that a record with teh values "Jason Bateman" and
"Jack Black" will only be matched if hte user searches for "Jason Bateman"
or "Jack Black" ... searching for "Bateman Jack" or "Black Jason" will not
work.

furthermore, as you learn more about lucene you'll find things like
positionIncrimentGap, PhraseQueries and "slop" which will show you how you
can support queries like "Jason Black" matching movies staring "Jason
Bateman Black" or "Jason Black" but not movies staring both "Jason
Bateman" and "Jack Black" (unless you want them to)

: Jason Bateman. Also I think there would be some issues with insuring that
: all updates, adds and deletes were properly synced with the index, as well
: as the possibility of duplicate rows in the index. Can people out there help
: with any other pros and cons to this approach?

concern about replication is a perfectly valid one ... but it's an issue
that also plauges RDBMs ... i've yet to see a database that could garuntee
100% that every slave would surface newly repliated data from the master
at the *exact* same time.

I would suggest you take a look at the Solr project...

	http://incubator.apache.org/solr/

...it's very easy to get the demo up and running and to play with things
like positionIncrimentGap and various analyzers (all without ever writting
any lines of code) and Solr has scripts to make master=>slave replication
easy on machines that support rsync and hard links (ie: Unix/Linux)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message