lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: schemaless vs schema based core
Date Fri, 22 Jan 2016 19:19:48 GMT
Yes, and also underflow in the case of double/float.

--
Steve
www.lucidworks.com

> On Jan 22, 2016, at 12:25 PM, Shyam R <shyam.remella@gmail.com> wrote:
> 
> I think, schema-less mode might allocate double instead of float, long
> instead of int to guard against overflow, which increases index size. Is my
> assumption valid?
> 
> Thanks
> 
> 
> 
> 
> On Thu, Jan 21, 2016 at 10:48 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
> 
>> I guess it's all about whether schemaless really supports
>> 1> all the docs you index.
>> 2> all the use-cases for search.
>> 3> the assumptions it makes scale to you needs.
>> 
>> If you've established rigorous tests and schemaless does all of the
>> above, I'm all for shortening the cycle by using schemaless.
>> 
>> But if it's just being sloppy and "success" is "I managed to index 50
>> docs and get some results back by searching", expect to find some
>> "interesting" issues down the road.
>> 
>> And finally, if it's "we use schemaless to quickly try things in the
>> UI and for the _real_ prod environment we need to be more rigorous
>> about the schema", well shortening development time is A Good Thing.
>> Part of moving to prod could be taking the schema generated by
>> schemaless and tweaking it for instance.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Jan 21, 2016 at 8:54 AM, Shawn Heisey <apache@elyograg.org> wrote:
>>> On 1/21/2016 2:22 AM, Prateek Jain J wrote:
>>>> Thanks Erick,
>>>> 
>>>> Yes, I took same approach as suggested by you. The issue is some
>> developers started with schemaless configuration and now they have started
>> liking it and avoiding restrictions (including increased time to deploy
>> application, in managed enterprise environment). I was more concerned about
>> pushing best practices around this in team, because allowing anyone to new
>> attributes will become overhead in terms of management, security and
>> maintainability. Regarding your concern about not storing documents on
>> separate disk; we are storing them in solr but not as backup copies. One
>> doubt still remains in mind w.r.t auto-detection of types in  solr:
>>>> 
>>>> Is there a performance benefit of using defined types (schema based)
>> vs un-defined types while adding documents? Does "solrj" ships this
>> meta-information like type of attributes to solr, because code looks
>> something like?
>>>> 
>>>> SolrInputDocument doc = new SolrInputDocument();
>>>>      doc.addField("category", "book"); // String
>>>>      doc.addField("id", 1234); //Long
>>>>      doc.addField("name", "Trying solrj"); //String
>>>> 
>>>> In my opinion, any auto-detector code will have some overhead vs the
>> other; any thoughts around this?
>>> 
>>> Although the true reality may be more complex, you should consider that
>>> everything Solr receives from SolrJ will be text -- as if you had sent
>>> the JSON or XML indexing format manually, which has no type information.
>>> 
>>> When you are building a document with SolrInputDocument, SolrJ has no
>>> knowledge of the schema in Solr.  It doesn't know whether the target
>>> field is numeric, string, date, or something else.
>>> 
>>> Using different object types for input to SolrJ just gives you general
>>> Java benefits -- things like detecting certain programming errors at
>>> compile time.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
> 
> 
> 
> -- 
> Ph: 9845704792


Mime
View raw message