lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Hull <char...@flax.co.uk>
Subject Re: Solr 5 options
Date Wed, 15 Jul 2015 08:13:25 GMT
On 14/07/2015 17:04, Erick Erickson wrote:
> Well, Shawn I for one am in your corner.
>
> Schemaless is great for getting thing running, but it's
> not an AI. And it can get into trouble guessing. Say
> it guesses a field should be an int because the first one
> it sees is 123 but it's really a part number. Then when
> a part number 123-456 comes through the doc will fail
> to index  with "illegal number format".

This is my issue with 'schemaless' - it makes far too many assumptions 
about data types. Elasticsearch suffers from this as well: 
https://orchestrate.io/blog/2014/09/30/improved-elasticsearch-indexing/

Charlie
>
> bq: Also, does the fact that I intend to use a data import handler to run feeds
> from large numbers of oracle schemas have any impact on the above?
>
> Yes. You have to map the DB schemas into Solr
> somehow. Schemaless will try to guess, but as above it doesn't
> have any real understanding of the data. Dynamic fields are certainly
> a viable option, you'll be assigning columns to fields for each schema
> variant though.
>
> Best,
> Erick
>
> On Tue, Jul 14, 2015 at 6:15 AM, Shawn Heisey <apache@elyograg.org> wrote:
>> On 7/14/2015 4:44 AM, spleenboy wrote:
>>> Many Thanks to those who helped me on my last post: I'm almost there.
>>> So here is the doc I need to index:
>>> {
>>>    "doc":
>>>    {
>>>      "id":"2",
>>>      "cus_name_s":"Paul Brown",
>>>      "cus_email_t":["paul.brown@here.net"],
>>>      "com_id_i":201,
>>>      "com_name_s":"Berenices",
>>>      "url_s":"domain.net/integration/"}}
>>>
>>> I only need to be able to search on email.
>>> My plan was to to use classic, as I was going to run this on a single node.
>>> I am happy to use dynamic fields to define the structure of the doc, so I
>>> don't think I need a schema.xml: I think this is classic/schemaless (?)
>>> I am still a little confused between schemaless and managed schema.
>>> Do I implement this using the right combination of parameters in my bin/solr
>>> create_core command.
>>> Also, does the fact that I intend to use a data import handler to run feeds
>>> from large numbers of oracle schemas have any impact on the above?
>>
>> The "schemaless" mode isn't really schemaless ... it just means that
>> Solr will automatically guess what fieldType to use for a field that has
>> never been seen before, and then modify the schema to include that field
>> with the guessed fieldType.  It's sort of like the managed schema,
>> except it's managed automatically instead of by the admin.
>>
>> I personally would not want Solr to guess on the schema, I would want to
>> explicitly define Solr's behavior ... but not everyone does things the
>> same way that I do.
>>
>> Thanks,
>> Shawn
>>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Mime
View raw message