lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Swanhart <greenl...@gmail.com>
Subject When do document ids change
Date Fri, 29 Oct 2004 18:48:01 GMT
Given an FSDirectory based index A.
Documents are added to A with an IndexWriter
  minMergeDocs = 20000
  mergeFactor = 3

Documents are never deleted.

Once the RAMDirectory merges documents to the index:

a) will the documentID values for index A ever change?
b) can a mapping between a term in the document and newly created
documentID be made?

Why I am asking this question:
I have a database with about 10M rows in it.  My search engine needs
to be able to quickly
get all the rows back from the database that match a query.  All the
rows need to be
returned at once, because the entire result set is sorted based on user input.  

What I want to do:
When a documentID gets assigned to a document, I want to update the
database row with
that matches the document field "id" with the lucene documentID.  That
way, I can use a
hitcollector to gather just the documentID values from the search and
insert them into a
temporary cache table, then grab the matching rows from the database. 
This will work assuming the documentID values for the given document
never change.

Currently, running an IndexSearcher.search() and getting all the rows
back takes between
5 and 30 seconds for most queries, which is certainly not fast enough.
 The time it takes to collect the documentIDs however is less than 1
second.  All the time is taken by calling
hits.doc() for each document to get the "id" field to insert into the database. 

So finally,  will what I want to do work, and if so, how can I go
about updating the database when the documentID is created?

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message