lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: NO_NORMS and TOKENIZED?
Date Mon, 19 Feb 2007 07:14:12 GMT

On Feb 18, 2007, at 10:33 PM, Chris Hostetter wrote:

> I don't suppose you have a mailing pointer to my old comments do you
> Marvin?

http://tinyurl.com/394apl (mail-archives.apache.org)

You're in good company.  The other party with strong objections was  
Doug.

http://tinyurl.com/36ucj2

> en if i wanted to be able to use
> an option on field foo for some docs and not on others i'd have to  
> have
> foo_optOFF and foo_optON ... then anytime i wanted to search on  
> "foo" i'd
> have to use a booleanquery without a coord factor across both.

I'm trying to think of an example where that would actually come into  
play.  What are some of the options you'd turn on and off?  Norms?   
Tokenization?

IIUC, that's a third objection in addition to your two from the  
previous discussion.  The other two were "evolving" indexes, and what  
you described as "dynamically typed fields" but what I would call  
"multi-dimensional data".

KinoSearch's Schema scheme allows you to add new fields, but you  
can't take any away or change any existing defs  -- so you're able to  
evolve, but only within fairly tight constraints.  I can't think of  
an elegant way to improve that situation, so I've declared that  
aspect "good enough" and we're moving on.  I don't think it's really  
any worse than what we have now -- where field defs persist, stored  
in field infos files, etc, and the resolution of conflicts can bite  
you in the butt (as I recall hearing you discuss when no-norms was  
being hashed out wrt suddenly having lots and lots of memory-sucking  
norms).

The multi-dimensional data problem is the one I'm most interested in  
solving.  Lucene/KS indexes don't handle one-to-many relationships  
well.  In your example, you had an index where the products had  
"attributes", and the attributes might have taken many different  
names -- so it wasn't possible to know all the field names in  
advance.  It sounds like, effectively, you were faking a second  
table.  So long as the names of your attributes don't crash into the  
names of your primary fields, that'll work.

At present, KS doesn't let you do something like that -- you have to  
define all your fields up front.  What I'd like to do is come up with  
a FieldDef subclass that handles multi-dimensional data.  I seem to  
recall that Solr had something along those lines, using prefixed  
field names or something.  Do I recall correctly?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message