lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: NO_NORMS and TOKENIZED?
Date Mon, 19 Feb 2007 20:07:04 GMT

On Feb 19, 2007, at 11:32 AM, Grant Ingersoll wrote:

> FWIW, we support, in our in-house system and in addition to fixed  
> field semantics,  completely dynamic field names for some  
> applications, but they have a fixed field type.  So, the field name  
> can be anything, but the attributes of the field are fixed (i.e. it  
> will always be tokenized with norms). This is useful for us, in  
> some cases, when indexing XML files where the tag name becomes the  
> field name and the set of tag names are not known ahead of time.  I  
> suppose there are ways around this (by preprocessing all the  
> files), but having the ability to add arbitrary fields is a good  
> thing for us and some of the applications we do.

The thing I don't like about this is that it prevents validation of  
field names, which is something I use a lot  in KS (e.g. try to  
delete a term from a field that's not indexed, get an error, as the  
field name was probably misspelled).  I can see the use, it just  
means sacrificing a lot of type safety for the more common cases.   
The user base at large has to suffer with more frequent, hard-to- 
detect bugs for a feature only needed by a few users.

About your app in particular -- how do you handle identical XML tag  
names that mean totally different things when nested inside different  
elements?

    <company>
      <name>Acme</name>
    </company>
    <product>
      <name>Widget</name>
    </product>

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message