lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Jones>
Subject Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)
Date Wed, 02 Nov 2005 16:36:16 GMT
> If you're willing to continue subsetting / summarizing the data out into
> Lucene, how about subsetting it out into a dedicated MySQL instance for
> this purpose?  100 artists * 1M profiles * 2 ints * 4 bytes/int =
> roughly 1 GB of data, which would easily fit into RAM.  Queries should
> be pretty fast off of that.  Good luck!

We used to do this, but there is a lot of overhead involved in 
updating/deleting/inserting all those rows / db indexes More wasted cycles 
and disk activity than we see with lucene. Even ignoring the fancy ACID stuff 
with MyISAM (no ref. integrity) it's still slower.

Furthermore, with lucene i can query "artists:1" and it returns what lucene 
deems to be the "best" matches for artist 1 (radiohead). This is far easier 
that with an SQL database, because the person whose listen counter for 
radiohead is highest isnt necessarily the "biggest fan". it depends on the 
size of the profile.  This gets even more complicated when trying to find the 
"best" fan of a combination of a few artists. Lucene is more useful for this 
than a database query.


> --MDC
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message