lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Fawcett" <fawc...@gmail.com>
Subject Fwd: Client-Server Lucene - DocumentWriter
Date Wed, 25 Oct 2006 12:47:24 GMT
---------- Forwarded message ----------
From: John Fawcett <John.Fawcett@tamaleresearch.com>
Date: Wed, 25 Oct 2006 08:39:23 -0400
Subject: Client-Server Lucene - DocumentWriter
To: fawcett@gmail.com

Hi,

I have a design challenge in my own application's use of Lucene, which
triggered an idea for distributed Lucene indexing. Below, I've
summarized the design challenge, and then the indexing idea.

My team is working on a client/server application. The server is a
java application, and the client is in C#/.net.

Right now we are adding capability for offline operation of the
client. Search is part of this work, so we have been working with
Lucene.net to port some of our online search capabilities to offline.

The client only holds a subset of the data held on the server, so we'd
like to move a subset of the search index to the client. There are two
types of transfers - bulk and incremental.

Our goal in both is to offload as much work as possible from the
client to the server.

Bulk transfers happen when a client is initializing for offline use,
or resynching after returning to online. In these scenarios we plan to
create a new index on the server, and just send the files to the
client. The client will then have to perform an index merge.

Incremental adds happen when the client application is online. New
documents are transferred to the client asynchronously. Currently, we
are transferring a document's extracted text. However, the client
still has to perform analysis, inversion, and addition to the index.

Looking through the code for the IndexWriter, I found the
DocumentWriter class. DocumentWriter does the inversion and stores it
in a set of integer arrays and an array of "Posting" objects. Looking
through the class, it seems like the inversion info could be
serialized from server to client pretty easily. The serialized data
from DocumentWriter would be a portable "index record" for a single
document.

Our hope is that we can send this index record from the server to the
client. The idea is to reduce the work on the client to be only the
insertion of the inverted document to the local index.

Having a portable index "record" for an individual document seems very
useful - especially for distributed indexing. I can imagine running a
farm of indexers that only invert documents and send them to a set of
search machines that maintain indexes and field search queries.

Is this something that could be added to the Lucene framework? Is the
"search record" data calculated in DocumentWriter in any way dependent
on the contents of the index? Will this actually save us many client
cycles?

Thanks,
fawce

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message