lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Lewis <sle...@panopto.com>
Subject Re: Updating solr schema for a collection in place
Date Tue, 21 Jun 2016 05:26:12 GMT
Oh, also I see when I first replied, I missed addressing this


> For instance,
> ​ ​
> having a field defined with docValues set to false, indexing some data
> then changing that field to docValues="true" and indexing some more data

will give you "interesting" results.


The way we update our data model is to run fields in parallel as we migrate
fields through a "rebuild" we do in the background. For any update
requiring in place updates of fields (which we've yet to do), we would have
stood up a parallel cloud and run a data migration. (We weren't actually
100% sure this would be strictly necessary.) If I understand you right, we
could use the managed schema
<https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig>
factory
and perform an atomic field update to the schema like this in place. That
could make testing and migration even quicker for us. I think in prod we
would be able to make good use of this too, though we would probably still
want to run a parallel cloud in isolation while doing this update if there
were a risk of delay in write throughput or heavy perf at peak time.

In my test environment noodling, I noticed that even when using a managed
schema, I could update the solrconfig.xml through a reload. Is it generally
safe to switch between schema factories through schema reloads, or is this
getting on the "cavalier" side of things? :)

On Mon, Jun 20, 2016 at 9:51 PM, Stephen Lewis <slewis@panopto.com> wrote:

> ​Thanks for the advice! I haven't encountered those nuances yet so it's
> great to be aware of them now.
>
> I manage our solr clouds through an OO python package which models our
> search stack. We use this package deploy to stacks which are isolated and
> configurable, but otherwise identical. We push our updates to the config
> through to our test environment for a test pass, next to our production
> clouds in parallel, and finally we flight to users. It's been a pretty good
> system so far, and generally I haven't had many issues using solr 6.0. We
> were using 4.9 until relatively recently, and we did have some troubles
> with the collections API. In those cases, we resolved by recreating the
> collection. So far 6.0 seems to hum along gracefully as we use the API.
>
> Thanks again for letting us know to keep a sharp eye on the details and to
> be on the lookout for interesting behavior :)
>
> Best,
> Stephen
>
> On Mon, Jun 20, 2016 at 7:56 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> Glad you found the issue. The switch to managed has tripped up
>> more people than just you!
>>
>> Do be a little cautious about changing the schema however. There
>> are some "benign" changes you can do when you already have data
>> indexed and a series of others that are not benign. For instance,
>> having a field defined with docValues set to false, indexing some data
>> then changing that field to docValues="true" and indexing some more data
>> will give you "interesting" results.
>>
>> Other operations, like adding new fieldTypes or new Fields are entirely
>> benign.
>>
>> Mostly, this is just a caution that if you are changing your schema
>> and find results wonky (e.g. facet counts not correct, docs not being
>> found
>> when you change stemming, etc). to consider deleting/recreating the
>> collection before tearing your hair out.
>>
>> Best,
>> Erick
>>
>> On Mon, Jun 20, 2016 at 10:37 PM, Stephen Lewis <slewis@panopto.com>
>> wrote:
>> > I'm happy to say I figured out the issue. Looking through previous
>> > questions in this forum, I was able to find someone hitting the same
>> issue
>> > which I was. After upgrading versions, we switched to the managed
>> instead
>> > of the ClassicIndexSchemaFactory unintentionally. Sorry for the bother!
>> >
>> > On Mon, Jun 20, 2016 at 7:01 PM, Stephen Lewis <slewis@panopto.com>
>> wrote:
>> >
>> >> Hello,
>> >>
>> >> I've recently set up a solr cloud using solr 6.0, and I've been having
>> >> some trouble getting our collections to pick up schema updates.
>> Following
>> >> the docs on zkcli.sh
>> >> <
>> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files>
>> and
>> >> the collections API
>> >> <
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
>> >,
>> >> I have uploaded the new schema by placing it onto a solr node at
>> >> /opt/solr/server/configsets/my_collection/conf/schema.xml and running
>> >>
>> >> /opt/solr/cloud-scripts/zkcli.sh \
>> >>
>> >> -zkhost zkdns.foo.bar \
>> >>
>> >> -cmd upconfig \
>> >>
>> >> -confname my_collection \
>> >>
>> >> -confdir /opt/solr/server/configsets/my_collection/conf
>> >>
>> >>
>> >> and then triggering a reload of the collection by hitting
>> >>
>> >>
>> >>
>> solrNodeDns.foo.bar:<PORT>/solr/admin/collections?action=RELOAD&name=my_collection
>> >>
>> >>
>> >> The action reports success.
>> >>
>> >> Afterwards, however, I see something kind of strange. If I go to the
>> admin
>> >> page and look at the schema in /
>> >> ~cloud?view=tree,
>> >> the updated schema is present. However, when I go to the collections
>> >> admin page and click on schema, I do not see the new fields present.
>> >> Querying for them directly also continues to lead to 400 "bad request"
>> >> responses, so suggesting that the new schema hasn't been picked up
>> anywhere
>> >> else either.
>> >>
>> >> Is there another step that I am missing to complete the update? I found
>> >> this stack overflow
>> >> <
>> http://stackoverflow.com/questions/36714077/solr-reload-is-not-picking-up-the-latest-changes-from-zookeeper
>> >
>> >> post where the posted is advised to recreate each core, though this
>> seems
>> >> like the wrong way to go to me. Any advice you have is appreciated.
>> >>
>> >> A few more notes about the cluster: I am running solr 6.0 in solr-cloud
>> >> mode with freestanding zookeeper machines running zk 3.4.6.
>> >>
>> >> Thanks!
>> >>
>> >> Stephen
>> >>
>> >> stephen-lewis.net
>> >>
>> >
>> >
>> >
>> > --
>> > Stephen
>> >
>> > (206)753-9320
>> > stephen-lewis.net
>>
>
>
>
> --
> Stephen
>
> (206)753-9320
> stephen-lewis.net
>



-- 
Stephen

(206)753-9320
stephen-lewis.net

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message