lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Recap on derived objects in Solr Index, 'schema in a can'
Date Wed, 22 Dec 2010 21:45:04 GMT
A dynamic field just means that the schema allows any field with a
name matching the wildcard. That's all.

There is no support for referring to all of the existing fields in the
wildcard. That is, there is no support for "*_en:word" as a field
search. Nor is there any kind of grouping for facets. The feature for
addressing a particular field in some of the parameters does not
support wildcards. If you add wildcard fields, you have to remember
what they are.

On Wed, Dec 22, 2010 at 11:04 AM, Dennis Gearon <gearond@sbcglobal.net> wrote:
> I'm open to cores, if it's the faster(indexing/querying/keeping mentally
> straight) way to do things.
>
> But from what you say below, the eventual goal of the site would mean either 100
> extra 'generic' fields, or 1,000-100,000's of cores.
> Probably cores is easier to administer for security and does more accurate
> querying?
>
> What is the relationship between dynamic fields and the schema?
>
>  Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> ----- Original Message ----
> From: Erick Erickson <erickerickson@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wed, December 22, 2010 10:44:27 AM
> Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'
>
> No, one cannot ignore the schema. If you try to add a field not in the
> schema you get
> an error. One could, however, use any arbitrary subset
> of the fields defined in the schema for any particular #document# in the
> index. Say
> your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one
> doc, and
> fields f6-f10 in another and f1, f4, f9 in another and.....
>
> The only field(s) that #must# be in a document are the required="true"
> fields.
>
> There's no real penalty for omitting fields from particular documents. This
> allows
> you to store "special" documents that aren't part of normal searches.
>
> You could, for instance, use a document to store meta-information about your
> index that had whatever meaning you wanted in a field(s) that *no* other
> document
> had. Your app could then read that "special" document and make use of that
> info.
> Searches on "normal" documents wouldn't return that doc, etc.
>
> You could effectively have N indexes contained in one index where a document
> in each logical sub-index had fields disjoint from the other logical
> sub-indexes.
> Why you'd do something like that rather than use cores is a very good
> question,
> but you #could# do it that way...
>
> All this is much different from a database where there are penalties for
> defining
> a large number of unused fields.
>
> Whether doing this is wise or not given the particular problem you're trying
> to
> solve is another discussion <G>..
>
> Best
> Erick
>
> On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon <gearond@sbcglobal.net>wrote:
>
>> Based on more searches and manual consolidation, I've put together some of
>> the ideas for this already suggested in a summary below. The last item in
>> the
>> summary
>> seems to be interesting, low technical cost way of doing it.
>>
>> Basically, it treats the index like a 'BigTable', a la "No SQL".
>>
>> Erick Erickson pointed out:
>> "...but there's absolutely no requirement
>> that all documents in SOLR have the same fields..."
>>
>> I guess I don't have the right understanding of what goes into a Document
>> in Solr. Is it just a set of fields, each with it's own independent field
>> type
>> declaration/id, it's name, and it's content?
>>
>> So even though there's a schema for an index, one could ignore it and
>> jsut throw any other named fields and types and content at document
>> addition
>> time?
>>
>> So If I wanted to search on a base set, all documents having it, I could
>> then
>> additionally filter based on the (might be wrong use of this) dynamic
>> fields?
>>
>>
>>
>>
>>
>>
>> Origninal Thread that I started:
>> ----------------------------------------
>>
>>http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html
>>l
>>
>>
>>-----------------------------------------------------------------------------------------------------
>>-
>>
>> Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!):
>>
>>-----------------------------------------------------------------------------------------------------
>>-
>>
>>
>> 1/ Base object of some kind, x number of fields
>> 2/ Derived objects representing Divisiion in company, different customer
>> bases,
>> etc.
>>      each having 2 additional, unique fields.
>> 3/ Assume 1000 such derived object types
>> 4/ A 'flattened' Index would have the x base object fields,
>>    ****and 2000**** additional fields
>>
>>
>> ================================================
>> Solutions Posited
>> -----------------------
>>
>> A/ First thought, muliti-value columns as key pairs.
>>      1/ Difficult to access individual items of more than one 'word' length
>>             for querying in multivalued fields.
>>      2/ All sorts of statistical stuff probably wouldn't apply?
>>      3/ (James Dayer said:) There's also one "gotcha" we've experienced
>> when
>> searching acrosse
>>            multi-valued fields:  SOLR will match across field occurences.
>>             In the  example below, if you were to search
>> q=contrib_name:(james
>> AND smith),
>>             you will get this record back.  It matches one name from one
>> contributor  and
>>
>>             another name from a different contributor.  This is not what
>> our
>> users want.
>>
>>
>>             As a work-around, I am converting these to phrase queries with
>>             slop: "james smith"~50 ... Just use a slop # smaller than your
>> positionIncrementGap
>>
>>             and bigger than the # of terms entered.  This will  prevent the
>> cross-field matches
>>
>>             yet allow the words to occur in any  order.
>>
>>            The problem with this approach is that Lucene doesn't support
>> wildcards in phrases
>> B/ Dynamic fields was suggested, but I am not sure exactly how they
>>        work, and the person who suggested it was not sure it would work,
>> either.
>> C/ Different field naming conventions were suggested in field types were
>> similar.
>>        I can't predict that.
>> D/ Found this old thread, and i had other suggestions:
>>       1/ Use multiple cores, one for each record type/schema, aggregate
>> them in
>> during the query.
>>       2/ Use a fixed number of additional fields X 2. Eatch additional
>> field is
>> actually a pair of fields.
>>           The first of the pair gives the colmn name, the second gives the
>> data.
>>
>>            a) Although I like this, I wonder how many extra fields to use,
>>            b) it was pointed out that relevancy and other statistical
>> criterial
>> for queries might suffer.
>>       3/ Index the different objects exactly as they are, i.e. as Erick
>> Erickson said:
>>           "I'm not entirely sure this is germane, but there's absolutely no
>> requirement
>>
>>           that all documents in SOLR have the same fields. So it's possible
>> for
>> you to
>>
>>           index the "wildly different content" in "wildly different fields"
>> <G>. Then
>>
>>           searching for screen:LCD would be straightforward."...
>> Dennis Gearon
>>
>>
>> Signature Warning
>> ----------------
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better
>> idea to learn from others’ mistakes, so you do not have to make them
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>>
>> EARTH has a Right To Life,
>> otherwise we all die.
>>
>>
>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message