lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: One item, multiple fields, and range queries
Date Mon, 29 Mar 2010 20:27:54 GMT
Hi David,

On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote:
> I'm not sure what to make of "or index using a heterogeneous field
> schema, grouping the different doc type instances with a unique key
> (the one) to form a composite doc"

Lucene is schema-free - you can mix and match different document types in a single index.
 You could emulate this in Solr by merging the two document types and leaving blank the parts
that are inapplicable to a given instance.  E.g.:

Address-doc-type: 
	Field: Unique-key
	Field: Street
	Field: City
	...
	
Everything-else-doc-type:
	Field: Unique-key
	Field: Blob-o'-text
	...

Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ...
Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ...
Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ...
....

> I could use the scheme you mention provided with the spanNear query but
> it conflates different fields into one indexed field which will mess
> with the scoring and make queries like range queries if there are dates
> involved next to impossible.

I agree, dimensional reduction can be an issue, though I'm sure there are use cases where
the attendant scoring distortion would be acceptable, e.g. non-scoring filters.  (Stuffing
a variable number of addresses into a single document will also "mess with the scoring" unless
you turn off norms, which is of course another form of scoring-messing.)

I've seen a couple of different mentions of private SpanRangeQuery implementations on the
mailing lists, so range queries likely wouldn't be a problem for long, should it become a
general issue.

> This "solution" is really a hack workaround to a limitation in
> Lucene/Solr.  I was hoping to start a conversation to a more
> truer resolution to this problem rather than these workarounds
> which aren't always satisfactory.

Limitation: Solr/Lucene is not a database.  

"Solutions":
	1. Hack workaround
	2. Rewrite Solr/Lucene to be a database
	3. ? (fill in "more truer resolution" here)

Good luck,
Steve


Mime
View raw message