lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: SOLR-1131 - Multiple Fields per Field Type
Date Wed, 02 Dec 2009 19:30:16 GMT

On Dec 1, 2009, at 1:42 AM, Chris Hostetter wrote:

> 
> It feels like something we've overlooked in this discussion is whether we 
> need to worry about any FieldType API changes needed to make these new 
> "PolyField" classes aware of when they are multivalued.
> 
> The API suggestions grant made gives the FieldTYpe the ability to return a 
> Filed[] from a single field value input -- but it doesn't provide any 
> information about wether that field value is one of many values we're 
> indexing for this field name.
> 
> Imagine that i want to make an index of people i know.  Each person also 
> has multiple locations where they can frequently be found (home, work, 
> gym, girlfriends house, favorite coffee shop, etc..).  My common case is 
> to search for people, not locations, so it doesn't make sense to flatten 
> out and have a doc for each person+location, i just want a single doc per 
> person, but htat means i need a "locations" field that's multivalued.
> 
> If i'm using a simple "LatLonFieldType" that splits my comma seperated 
> coordinate string into a "locations__LAT" and a "locations__LON" field 
> then  iassume it needs to do something special in the multiValued case to 
> make sure later "near" searches don't get confused and think that the lat 
> from my "work" and the lon from my "home" are actaully a third location.
> 
> how do we solve this?

I'm not sure if you worry about it.  But I'd argue it isn't natural anyway.  You would do
the following instead, which is how any address book I've ever seen works:
<field name="home" type="LatLonFT"/>
<field name="work" type="LatLonFT"/>

So, maybe the FT can explicitly prohibit multivalued?   But, I suppose you could do the position
thing, too.  This could be achieved through a new SpanQuery pretty easily:  SpanPositionQuery
that takes in a term and a specific position.  Trivial to write, I think, just not sure if
it is generally useful.  Although, I must say I've been noodling around with the idea with
the notion of a "layered" field where variants of a primary token are stored at "sub positions"
of the primary token (instead of in separate copy fields) and then one could write a query
that says, for instance, search all of the "secondary" terms.  So, for instance, if you think
of each position containing a stack of terms, then you could say use the terms at position
two in the stack.  I'm not quite sure what this means just yet, but my thinking is that I
could get a really compact index at the cost of a slightly more complex query.  It also means
I would do some interesting things at query time that simply cannot be done across fields
at the moment, for instance, create a phrase type query that used different layers where appropriate.

-Grant
Mime
View raw message