lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <>
Subject Global field semantics
Date Sat, 08 Jul 2006 16:46:03 GMT
Many things would be cleaner in Lucene if fields had a global semantics,
i.e., if properties like text vs. binary, Index, Store, TermVector, the
appropriate Analyzer, the assignment of Directory in ParallelReader (or
ParallelWriter), etc. were a function of just the field name and the
index.  This approach would naturally admit a class, say IndexFieldSet,
that would hold global field semantics for an index.

Lucene today allows many field properties to vary at the Field level. 
E.g., the same field name might be tokenized in one Field on a Document
while it is untokenized in another Field on the same or different
Document.  Does anybody know how often this flexibility is used?  Are
there interesting use cases for which it is important?  It seems to me
this functionality is already problematic and not fully supported; e.g.,
indexing can manage tokenization-variant fields, but query parsing
cannot.  Various extensions to Lucene exacerbate this kind of problem.

Perhaps more controversially, the notion of global field semantics would
be even stronger if the set of fields is closed.  This would allow, for
example, QueryParser to validate field names.  This has a number of
benefits, including for example avoiding false-negative "no results" due
to misspelling a field name.

Has this been considered before?  Are there good reasons this path has
not been followed?

Thanks for any info,


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message