lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Gearon <gear...@sbcglobal.net>
Subject RE: A schema inside a Solr Schema (Schema in a can)
Date Fri, 17 Dec 2010 18:22:37 GMT
Quite a bit of this is over hy head at this point.

I shold NOT have duplicate fields in the column. I wonder how that affects things.


Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn
from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Fri, 12/17/10, Dyer, James <James.Dyer@ingrambook.com> wrote:

> From: Dyer, James <James.Dyer@ingrambook.com>
> Subject: RE: A schema inside a Solr Schema (Schema in a can)
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Friday, December 17, 2010, 9:43 AM
> There's also one "gotcha" we've
> experienced when searching acrosse multi-valued
> fields:  SOLR will match across field occurences. 
> In the example below, if you were to search
> q=contrib_name:(james AND smith), you will get this record
> back.  It matches one name from one contributor and
> another name from a different contributor.  This is not
> what our users want.
> 
> As a work-around, I am converting these to phrase queries
> with slop:  "james smith"~50 ... Just use a slop #
> smaller than your positionIncrementGap and bigger than the #
> of terms entered.  This will prevent the cross-field
> matches yet allow the words to occur in any order.  
> 
> The problem with this approach is that Lucene doesn't
> support wildcards in phrases.  Unlucky for us, because
> our app automatically adds a wildcard to every term entered
> in Contributor searching.  So when we convert to SOLR
> we will have to disable this "feature" for multi-word
> queries.  I experimented with the double metaphone
> filter (too many false positive matches) and edge n-gram
> filter (could make the index very big) to alleviate this
> loss of functionality.  Currently I have it set up to
> index each name as the full name plus the first
> initial.  (so "j dyer" would match but not "ja dyer")
> If this is considered not-good-enough, we can probably see
> about doing the edge n-grams several characters out... 
> 
> 
> If anyone else has any other ideas I should try, please do
> speak up.  Thank you.
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Dyer, James 
> Sent: Friday, December 17, 2010 10:59 AM
> To: solr-user@lucene.apache.org
> Subject: RE: A schema inside a Solr Schema (Schema in a
> can)
> 
> Dennis,
> 
> I may be misunderstanding your question, but think I've
> just worked through something similar.  We're indexing
> book metadata, and a book can have more than one
> Contributor.  We want to store both the contributor's
> name, their Role and their id (from our rel db).  With
> our old system, we had to do something like this:
> 
> contrib:  dyer, james|author|123
> contrib:  smith, sam|editor|456
> 
> But Lucene/Solr will guanantee that multivalued fields
> return in exactly the same order you put them in.  So
> with SOLR we can do this:
> 
> contrib_name: dyer, james
> contrib_name: smith, sam
> contrib_role: author
> contrib_role: editor
> contrib_id:123
> contrib_id:456
> 
> The trick is to be very careful you put everything in the
> same order (its easy if it is all from the same SQL query
> from an relational database).  If one of the data
> elements is a NULL you have to use a placeholder (like an
> empty string or a zero).
> 
> Another option is use a dynamic field:
> 
> contrib_123: dyer, james
> contrib_456: smith, sam
> 
> The problem here is if you want to display and use a
> fieldlist (fl=), you cannot use wildcards (ex: fl=contrib_*
> doesn't work).  Same for searching (q=, qf=).  You
> can only use dynamic fields if you know the fieldname at
> runtime you need to deal with.
> 
> Both of these options might be more work for your app to
> deal than the delimiter approach.  And, in our case, we
> could stick with the delimiter field and store it and then
> have a separate indexed field that just has the name (as
> this is all we search on).  You could even just have 1
> field if you used a fancy analysis sequence that would only
> index the element(s) you wanted indexes...
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> 
> Sent: Friday, December 17, 2010 12:43 AM
> To: solr-user@lucene.apache.org
> Subject: A schema inside a Solr Schema (Schema in a can)
> 
> Is it possible to put name value pairs of any type in a
> native Solr Index field type? Like JSON/XML/YML?
> 
> The reason that I ask, since you asked, is I want my main
> index schema to be a base object, and another multivalue
> column to be the attributes of base object inherited
> descendants. 
> 
> Is there any other way to do this?
> 
> What are the limitations in searching and indexing
> documents with multivalue fields?
> 
> Dennis Gearon
> 
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes.
> It is usually a better idea to learn from others’
> mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 

Mime
View raw message