lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: NO_NORMS and TOKENIZED?
Date Mon, 19 Feb 2007 22:29:20 GMT

On Feb 19, 2007, at 3:07 PM, Marvin Humphrey wrote:

>
> On Feb 19, 2007, at 11:32 AM, Grant Ingersoll wrote:
>
>> FWIW, we support, in our in-house system and in addition to fixed  
>> field semantics,  completely dynamic field names for some  
>> applications, but they have a fixed field type.  So, the field  
>> name can be anything, but the attributes of the field are fixed  
>> (i.e. it will always be tokenized with norms). This is useful for  
>> us, in some cases, when indexing XML files where the tag name  
>> becomes the field name and the set of tag names are not known  
>> ahead of time.  I suppose there are ways around this (by  
>> preprocessing all the files), but having the ability to add  
>> arbitrary fields is a good thing for us and some of the  
>> applications we do.
>
> The thing I don't like about this is that it prevents validation of  
> field names, which is something I use a lot  in KS (e.g. try to  
> delete a term from a field that's not indexed, get an error, as the  
> field name was probably misspelled).  I can see the use, it just  
> means sacrificing a lot of type safety for the more common cases.   
> The user base at large has to suffer with more frequent, hard-to- 
> detect bugs for a feature only needed by a few users.
>

Since all our dynamically named fields are of the same type, it isn't  
an issue for us at the moment.  Then again, though, we only have in- 
house users and don't have the same issue that you have.


> About your app in particular -- how do you handle identical XML tag  
> names that mean totally different things when nested inside  
> different elements?


It doesn't happen.  The tags are based on the output of some other  
processes and are unique and the tag/field name has semantics  
attached to it that is meaningful to the application.  I suppose,  
technically, they are known ahead of time, but there are potentially  
hundreds of them such that it doesn't make sense to populate them  
into our Field schema ahead of time as maintenance would be a nightmare.


>    <company>
>      <name>Acme</name>
>    </company>
>    <product>
>      <name>Widget</name>
>    </product>
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message