lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lamprecht <clampre...@gmail.com>
Subject Splitting index into indexed fields and stored fields for performance
Date Thu, 28 Apr 2005 23:53:08 GMT
I've been thinking about splitting a (presumably large) Lucene index
up into two:  one index only contains "indexed" (searchable) fields,
and the second index contains only stored fields.  I'm interested in
whether this might increase response time or throughput for a
high-volume system.  Google appears to use something like this, where
they have index servers that do the search, and then separate document
servers which return the documents (and compute the KWIC snippets, I
believe).   The docids in the two indexes would have to be in sync so
you could look up docs in the stored index using the docid from the
search index.

Since the "stored fields" index would basically just be a database,
perhaps this is better served using a traditional relational database
(or even use the OS's file system).  However, a traditional database
doesn't know what the docids are (and docids change over time), -- may
be one could store a single stored field in the "search index",
containing a permanent identifier for the DB lookup.

Has anyone tried this, or have any thoughts on whether this would
increase performance?   I've looked at Nutch some; it can use a
distributed index but it doesn't seem to split the index into stored
vs. indexed so I figure there may be a good reason for this (maybe
just for simplicity).

-Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message