lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Configurable indexing of an RDBMS, has it been done before?
Date Mon, 07 Feb 2005 08:07:44 GMT
Many times I've written ad-hoc code that pulls in data from an RDBMS and 
builds a Lucene index. The use case is a typical database-driven dynamic 
website which would be a hassle to spider (say, due to tricky 
authentication).

I had a feeling this had been done in a general manner but didn't see 
any code in the sandbox, nor did any searches turn it up.

I've spent a few mins thinking this thru - what I'd expect is to be able 
to configure is:

1. JDBC Driver + conn params
2. Query to do a 1 time full index
3. Query to show new records
4. Query to show changed records
5. Query to show deleted records
6. Query columns to Lucene Field name mapping
7. "Type" of each field name (e.g. the equivalent of the args to the 
Field ctr)

So a simple example, taking item 2 is

	query: "select url, name, body from foo"

(now the column to field mapping)
	col 1 => url
	col 2 => title
	col 3 => contents

(now the field types for each named field)

	url => Field( ...store=true, index=false)
       title => Field( ...store=true, index=true)
    contents => Field( ...store=false, index=true)



And voilla, nice, elegant, data driven indexing.
Does it exist?
Should it? :)

PS
  I know in the more general form, "query" needs to be replaced by 
"queries" above, and the updated query may need some time stamp variable 
expansion, and possibly the queries need "paging" to deal w/ lamo DBs 
like mysql that don't have cursors for large result sets...




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message