lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: schemaless vs schema based core
Date Sat, 23 Jan 2016 02:22:30 GMT
And, more generally, schemaless makes a series of assumptions, any
of which may be wrong.

You _must_ hand-tweak your schema to squeeze all the performance out of Solr
that you can. If your collection isn't big enough that you need to squeeze,
don't bother....

FWIW,
Erick

On Fri, Jan 22, 2016 at 11:19 AM, Steve Rowe <sarowe@gmail.com> wrote:
> Yes, and also underflow in the case of double/float.
>
> --
> Steve
> www.lucidworks.com
>
>> On Jan 22, 2016, at 12:25 PM, Shyam R <shyam.remella@gmail.com> wrote:
>>
>> I think, schema-less mode might allocate double instead of float, long
>> instead of int to guard against overflow, which increases index size. Is my
>> assumption valid?
>>
>> Thanks
>>
>>
>>
>>
>> On Thu, Jan 21, 2016 at 10:48 PM, Erick Erickson <erickerickson@gmail.com>
>> wrote:
>>
>>> I guess it's all about whether schemaless really supports
>>> 1> all the docs you index.
>>> 2> all the use-cases for search.
>>> 3> the assumptions it makes scale to you needs.
>>>
>>> If you've established rigorous tests and schemaless does all of the
>>> above, I'm all for shortening the cycle by using schemaless.
>>>
>>> But if it's just being sloppy and "success" is "I managed to index 50
>>> docs and get some results back by searching", expect to find some
>>> "interesting" issues down the road.
>>>
>>> And finally, if it's "we use schemaless to quickly try things in the
>>> UI and for the _real_ prod environment we need to be more rigorous
>>> about the schema", well shortening development time is A Good Thing.
>>> Part of moving to prod could be taking the schema generated by
>>> schemaless and tweaking it for instance.
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Jan 21, 2016 at 8:54 AM, Shawn Heisey <apache@elyograg.org> wrote:
>>>> On 1/21/2016 2:22 AM, Prateek Jain J wrote:
>>>>> Thanks Erick,
>>>>>
>>>>> Yes, I took same approach as suggested by you. The issue is some
>>> developers started with schemaless configuration and now they have started
>>> liking it and avoiding restrictions (including increased time to deploy
>>> application, in managed enterprise environment). I was more concerned about
>>> pushing best practices around this in team, because allowing anyone to new
>>> attributes will become overhead in terms of management, security and
>>> maintainability. Regarding your concern about not storing documents on
>>> separate disk; we are storing them in solr but not as backup copies. One
>>> doubt still remains in mind w.r.t auto-detection of types in  solr:
>>>>>
>>>>> Is there a performance benefit of using defined types (schema based)
>>> vs un-defined types while adding documents? Does "solrj" ships this
>>> meta-information like type of attributes to solr, because code looks
>>> something like?
>>>>>
>>>>> SolrInputDocument doc = new SolrInputDocument();
>>>>>      doc.addField("category", "book"); // String
>>>>>      doc.addField("id", 1234); //Long
>>>>>      doc.addField("name", "Trying solrj"); //String
>>>>>
>>>>> In my opinion, any auto-detector code will have some overhead vs the
>>> other; any thoughts around this?
>>>>
>>>> Although the true reality may be more complex, you should consider that
>>>> everything Solr receives from SolrJ will be text -- as if you had sent
>>>> the JSON or XML indexing format manually, which has no type information.
>>>>
>>>> When you are building a document with SolrInputDocument, SolrJ has no
>>>> knowledge of the schema in Solr.  It doesn't know whether the target
>>>> field is numeric, string, date, or something else.
>>>>
>>>> Using different object types for input to SolrJ just gives you general
>>>> Java benefits -- things like detecting certain programming errors at
>>>> compile time.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>
>>
>>
>>
>> --
>> Ph: 9845704792
>

Mime
View raw message