lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: schemaless vs schema based core
Date Sat, 23 Jan 2016 02:39:22 GMT
Yo. That is the truth. You can get stuff indexed with an automatic schema, but if you want
to make your customers happy, tune it.

wunder 
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 22, 2016, at 6:22 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> And, more generally, schemaless makes a series of assumptions, any
> of which may be wrong.
> 
> You _must_ hand-tweak your schema to squeeze all the performance out of Solr
> that you can. If your collection isn't big enough that you need to squeeze,
> don't bother....
> 
> FWIW,
> Erick
> 
> On Fri, Jan 22, 2016 at 11:19 AM, Steve Rowe <sarowe@gmail.com> wrote:
>> Yes, and also underflow in the case of double/float.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jan 22, 2016, at 12:25 PM, Shyam R <shyam.remella@gmail.com> wrote:
>>> 
>>> I think, schema-less mode might allocate double instead of float, long
>>> instead of int to guard against overflow, which increases index size. Is my
>>> assumption valid?
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Jan 21, 2016 at 10:48 PM, Erick Erickson <erickerickson@gmail.com>
>>> wrote:
>>> 
>>>> I guess it's all about whether schemaless really supports
>>>> 1> all the docs you index.
>>>> 2> all the use-cases for search.
>>>> 3> the assumptions it makes scale to you needs.
>>>> 
>>>> If you've established rigorous tests and schemaless does all of the
>>>> above, I'm all for shortening the cycle by using schemaless.
>>>> 
>>>> But if it's just being sloppy and "success" is "I managed to index 50
>>>> docs and get some results back by searching", expect to find some
>>>> "interesting" issues down the road.
>>>> 
>>>> And finally, if it's "we use schemaless to quickly try things in the
>>>> UI and for the _real_ prod environment we need to be more rigorous
>>>> about the schema", well shortening development time is A Good Thing.
>>>> Part of moving to prod could be taking the schema generated by
>>>> schemaless and tweaking it for instance.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Thu, Jan 21, 2016 at 8:54 AM, Shawn Heisey <apache@elyograg.org>
wrote:
>>>>> On 1/21/2016 2:22 AM, Prateek Jain J wrote:
>>>>>> Thanks Erick,
>>>>>> 
>>>>>> Yes, I took same approach as suggested by you. The issue is some
>>>> developers started with schemaless configuration and now they have started
>>>> liking it and avoiding restrictions (including increased time to deploy
>>>> application, in managed enterprise environment). I was more concerned about
>>>> pushing best practices around this in team, because allowing anyone to new
>>>> attributes will become overhead in terms of management, security and
>>>> maintainability. Regarding your concern about not storing documents on
>>>> separate disk; we are storing them in solr but not as backup copies. One
>>>> doubt still remains in mind w.r.t auto-detection of types in  solr:
>>>>>> 
>>>>>> Is there a performance benefit of using defined types (schema based)
>>>> vs un-defined types while adding documents? Does "solrj" ships this
>>>> meta-information like type of attributes to solr, because code looks
>>>> something like?
>>>>>> 
>>>>>> SolrInputDocument doc = new SolrInputDocument();
>>>>>>     doc.addField("category", "book"); // String
>>>>>>     doc.addField("id", 1234); //Long
>>>>>>     doc.addField("name", "Trying solrj"); //String
>>>>>> 
>>>>>> In my opinion, any auto-detector code will have some overhead vs
the
>>>> other; any thoughts around this?
>>>>> 
>>>>> Although the true reality may be more complex, you should consider that
>>>>> everything Solr receives from SolrJ will be text -- as if you had sent
>>>>> the JSON or XML indexing format manually, which has no type information.
>>>>> 
>>>>> When you are building a document with SolrInputDocument, SolrJ has no
>>>>> knowledge of the schema in Solr.  It doesn't know whether the target
>>>>> field is numeric, string, date, or something else.
>>>>> 
>>>>> Using different object types for input to SolrJ just gives you general
>>>>> Java benefits -- things like detecting certain programming errors at
>>>>> compile time.
>>>>> 
>>>>> Thanks,
>>>>> Shawn
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Ph: 9845704792
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message