lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: schemaless slow indexing
Date Sun, 22 Mar 2015 19:29:37 GMT
I took a quick look at the stock schemaless configs... unfortunately
they contain a performance trap.
There's a copyField by default that copies *all* fields to a catch-all
field called "_text".

IMO, that's not a great default.  Double the index size (well, the
"index" portion of it at least... not stored fields), and slower
indexing performance.

The other unfortunate thing is the name.  No where else in solr (that
I know of) do we have a single underscore field name.  _text looks
more like a dynamicField pattern.  Our other fields with underscores
look like _version_ and _root_.  If we're going to start a new naming
convention (or expand the naming conventions) we need to have some
consistency and logic behind it.

-Yonik

On Sun, Mar 22, 2015 at 12:32 PM, Mike Murphy <mmurphy3141@gmail.com> wrote:
> I start up solr schemaless and index a bunch of data, and it takes a
> lot longer to finish indexing.
> No configuration changes, just straight schemaless.
>
> --Mike
>
> On Sun, Mar 22, 2015 at 12:27 PM, Erick Erickson
> <erickerickson@gmail.com> wrote:
>> Please review: http://wiki.apache.org/solr/UsingMailingLists
>>
>> You haven't quantified the slowdown. Or given any details on how
>> you're measuring the "slowdown". Or how you've configured your setups
>> in 4.10 and 5.0. Or... Ad Hossman would say "details matter".
>>
>> Best,
>> Erick
>>
>> On Sun, Mar 22, 2015 at 8:35 AM, Mike Murphy <mmurphy3141@gmail.com> wrote:
>>> I'm trying out schemaless in solr 5.0, but the indexing seems quite a
>>> bit slower than it did in the past on 4.10.  Any pointers?
>>>
>>> --Mike

Mime
View raw message