lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ctarg...@apache.org
Subject [lucene-solr] branch branch_8x updated: SOLR-13259: Add new section on Reindexing in Solr (#594)
Date Mon, 04 Mar 2019 18:25:52 GMT
This is an automated email from the ASF dual-hosted git repository.

ctargett pushed a commit to branch branch_8x
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/branch_8x by this push:
     new 68adeab  SOLR-13259: Add new section on Reindexing in Solr (#594)
68adeab is described below

commit 68adeab46a08fdc66c6d613e5761413f16b45c0e
Author: Cassandra Targett <cassandra.targett@lucidworks.com>
AuthorDate: Mon Mar 4 12:24:58 2019 -0600

    SOLR-13259: Add new section on Reindexing in Solr (#594)
    
    Add new reindexing.adoc page; standardize on "reindex" vs "re-index"
---
 solr/solr-ref-guide/src/collections-api.adoc       |   4 +
 solr/solr-ref-guide/src/docvalues.adoc             |   4 +-
 .../src/indexing-and-basic-data-operations.adoc    |   4 +-
 .../src/major-changes-in-solr-7.adoc               |   8 +-
 solr/solr-ref-guide/src/managed-resources.adoc     |   2 +-
 solr/solr-ref-guide/src/reindexing.adoc            | 191 +++++++++++++++++++++
 solr/solr-ref-guide/src/schema-api.adoc            |  12 +-
 .../src/shards-and-indexing-data-in-solrcloud.adoc |   2 +-
 solr/solr-ref-guide/src/solr-tutorial.adoc         |   6 +-
 .../src/updating-parts-of-documents.adoc           |   4 +-
 10 files changed, 218 insertions(+), 19 deletions(-)

diff --git a/solr/solr-ref-guide/src/collections-api.adoc b/solr/solr-ref-guide/src/collections-api.adoc
index c78db6d..e44c6ca 100644
--- a/solr/solr-ref-guide/src/collections-api.adoc
+++ b/solr/solr-ref-guide/src/collections-api.adoc
@@ -695,11 +695,15 @@ To confirm the creation of the alias, you can look in the Solr Admin
UI, under t
 
 Create an alias named "testalias" and link it to the collections named "anotherCollection"
and "testCollection".
 
+// tag::createalias-simple-example[]
+
 [source,text]
 ----
 http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=testalias&collections=anotherCollection,testCollection&wt=xml
 ----
 
+//end::createalias-simple-example[]
+
 *Output*
 
 [source,xml]
diff --git a/solr/solr-ref-guide/src/docvalues.adoc b/solr/solr-ref-guide/src/docvalues.adoc
index 12cdece..c481cf7 100644
--- a/solr/solr-ref-guide/src/docvalues.adoc
+++ b/solr/solr-ref-guide/src/docvalues.adoc
@@ -38,7 +38,7 @@ Enabling a field for docValues only requires adding `docValues="true"` to
the fi
 ----
 
 [IMPORTANT]
-If you have already indexed data into your Solr index, you will need to completely re-index
your content after changing your field definitions in `schema.xml` in order to successfully
use docValues.
+If you have already indexed data into your Solr index, you will need to completely reindex
your content after changing your field definitions in `schema.xml` in order to successfully
use docValues.
 
 DocValues are only available for specific field types. The types chosen determine the underlying
Lucene docValue type that will be used. The available Solr field types are:
 
@@ -79,7 +79,7 @@ If `docValues="true"` for a field, then DocValues will automatically be
used any
 
 Field values retrieved during search queries are typically returned from stored values. However,
non-stored docValues fields will be also returned along with other stored fields when all
fields (or pattern matching globs) are specified to be returned (e.g., "`fl=*`") for search
queries depending on the effective value of the `useDocValuesAsStored` parameter for each
field. For schema versions >= 1.6, the implicit default is `useDocValuesAsStored="true"`.
See <<field-type-definitions-and- [...]
 
-When `useDocValuesAsStored="false"`, non-stored DocValues fields can still be explicitly
requested by name in the <<common-query-parameters.adoc#fl-field-list-parameter,fl param>>,
but will not match glob patterns (`"*"`). Note that returning DocValues along with "regular"
stored fields at query time has performance implications that stored fields may not because
DocValues are column-oriented and may therefore incur additional cost to retrieve for each
returned document. Also note that w [...]
+When `useDocValuesAsStored="false"`, non-stored DocValues fields can still be explicitly
requested by name in the <<common-query-parameters.adoc#fl-field-list-parameter,fl param>>,
but will not match glob patterns (`"*"`). Note that returning DocValues along with "regular"
stored fields at query time has performance implications that stored fields may not because
DocValues are column-oriented and may therefore incur additional cost to retrieve for each
returned document. Also note that w [...]
 
 In cases where the query is returning _only_ docValues fields performance may improve since
returning stored fields requires disk reads and decompression whereas returning docValues
fields in the fl list only requires memory access.
 
diff --git a/solr/solr-ref-guide/src/indexing-and-basic-data-operations.adoc b/solr/solr-ref-guide/src/indexing-and-basic-data-operations.adoc
index 40b5f3f..71873608 100644
--- a/solr/solr-ref-guide/src/indexing-and-basic-data-operations.adoc
+++ b/solr/solr-ref-guide/src/indexing-and-basic-data-operations.adoc
@@ -1,5 +1,5 @@
 = Indexing and Basic Data Operations
-:page-children: introduction-to-solr-indexing, post-tool, uploading-data-with-index-handlers,
uploading-data-with-solr-cell-using-apache-tika, uploading-structured-data-store-data-with-the-data-import-handler,
updating-parts-of-documents, detecting-languages-during-indexing, de-duplication, content-streams
+:page-children: introduction-to-solr-indexing, post-tool, uploading-data-with-index-handlers,
uploading-data-with-solr-cell-using-apache-tika, uploading-structured-data-store-data-with-the-data-import-handler,
updating-parts-of-documents, detecting-languages-during-indexing, de-duplication, content-streams,
reindexing
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
@@ -39,6 +39,8 @@ This section describes how Solr adds data to its index. It covers the following
 
 * *<<content-streams.adoc#content-streams,Content Streams>>*: Information about
streaming content to Solr Request Handlers.
 
+* *<<reindexing.adoc#reindexing,Reindexing>>*: Details about when reindexing
is required or recommended, and some strategies for completely reindexing your documents.
+
 == Indexing Using Client APIs
 
 Using client APIs, such as <<using-solrj.adoc#using-solrj,SolrJ>>, from your
applications is an important option for updating Solr indexes. See the <<client-apis.adoc#client-apis,Client
APIs>> section for more information.
diff --git a/solr/solr-ref-guide/src/major-changes-in-solr-7.adoc b/solr/solr-ref-guide/src/major-changes-in-solr-7.adoc
index 5efa399..2dc9555 100644
--- a/solr/solr-ref-guide/src/major-changes-in-solr-7.adoc
+++ b/solr/solr-ref-guide/src/major-changes-in-solr-7.adoc
@@ -26,9 +26,9 @@ There are many hundreds of changes in Solr 7, however, so a thorough review
of t
 
 You should also consider all changes that have been made to Solr in any version you have
not upgraded to already. For example, if you are currently using Solr 6.2, you should review
changes made in all subsequent 6.x releases in addition to changes for 7.0.
 
-Re-indexing your data is considered the best practice and you should try to do so if possible.
However, if re-indexing is not feasible, keep in mind you can only upgrade one major version
at a time. Thus, Solr 6.x indexes will be compatible with Solr 7 but Solr 5.x indexes will
not be.
+<<reindexing.adoc#upgrades,Reindexing>> your data is considered the best practice
and you should try to do so if possible. However, if reindexing is not feasible, keep in mind
you can only upgrade one major version at a time. Thus, Solr 6.x indexes will be compatible
with Solr 7 but Solr 5.x indexes will not be.
 
-If you do not re-index now, keep in mind that you will need to either re-index your data
or upgrade your indexes before you will be able to move to Solr 8 when it is released in the
future. See the section <<indexupgrader-tool.adoc#indexupgrader-tool,IndexUpgrader Tool>>
for more details on how to upgrade your indexes.
+If you do not reindex now, keep in mind that you will need to either reindex your data or
upgrade your indexes before you will be able to move to Solr 8 when it is released in the
future. See the section <<indexupgrader-tool.adoc#indexupgrader-tool,IndexUpgrader Tool>>
for more details on how to upgrade your indexes.
 
 See also the section <<upgrading-a-solr-cluster.adoc#upgrading-a-solr-cluster,Upgrading
a Solr Cluster>> for details on how to upgrade a SolrCloud cluster.
 
@@ -131,7 +131,7 @@ The `qt` parameter is still used as a SolrJ special parameter that specifies
the
 === Point Fields Are Default Numeric Types
 Solr has implemented \*PointField types across the board, to replace Trie* based numeric
fields. All Trie* fields are now considered deprecated, and will be removed in Solr 8.
 
-If you are using Trie* fields in your schema, you should consider moving to PointFields as
soon as feasible. Changing to the new PointField types will require you to re-index your data.
+If you are using Trie* fields in your schema, you should consider moving to PointFields as
soon as feasible. Changing to the new PointField types will require you to reindex your data.
 
 === Spatial Fields
 
@@ -187,7 +187,7 @@ Note again that this is not a complete list of all changes that may impact
your
 
 * The Solr contribs map-reduce, morphlines-core and morphlines-cell have been removed.
 * JSON Facet API now uses hyper-log-log for numBuckets cardinality calculation and calculates
cardinality before filtering buckets by any `mincount` greater than 1.
-* If you use historical dates, specifically on or before the year 1582, you should re-index
for better date handling.
+* If you use historical dates, specifically on or before the year 1582, you should reindex
for better date handling.
 * If you use the JSON Facet API (json.facet) with `method=stream`, you must now set `sort='index
asc'` to get the streaming behavior; otherwise it won't stream. Reminder: `method` is a hint
that doesn't change defaults of other parameters.
 * If you use the JSON Facet API (json.facet) to facet on a numeric field and if you use `mincount=0`
or if you set the prefix, you will now get an error as these options are incompatible with
numeric faceting.
 * Solr's logging verbosity at the INFO level has been greatly reduced, and you may need to
update the log configs to use the DEBUG level to see all the logging messages you used to
see at INFO level before.
diff --git a/solr/solr-ref-guide/src/managed-resources.adoc b/solr/solr-ref-guide/src/managed-resources.adoc
index 0d5f372..f1b17bc 100644
--- a/solr/solr-ref-guide/src/managed-resources.adoc
+++ b/solr/solr-ref-guide/src/managed-resources.adoc
@@ -218,7 +218,7 @@ However, the intent of this API implementation is that changes will be
applied u
 
 [IMPORTANT]
 ====
-Changing things like stop words and synonym mappings typically require re-indexing existing
documents if being used by index-time analyzers. The RestManager framework does not guard
you from this, it simply makes it possible to programmatically build up a set of stop words,
synonyms, etc.
+Changing things like stop words and synonym mappings typically require reindexing existing
documents if being used by index-time analyzers. The RestManager framework does not guard
you from this, it simply makes it possible to programmatically build up a set of stop words,
synonyms, etc. See the section <<reindexing.adoc#reindexing,Reindexing>> for more
information about reindexing your documents.
 ====
 
 == RestManager Endpoint
diff --git a/solr/solr-ref-guide/src/reindexing.adoc b/solr/solr-ref-guide/src/reindexing.adoc
new file mode 100644
index 0000000..efd7573
--- /dev/null
+++ b/solr/solr-ref-guide/src/reindexing.adoc
@@ -0,0 +1,191 @@
+= Reindexing
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+There are several types of changes to Solr configuration that require you to reindex your
data.
+
+These changes include editing properties of fields or field types; adding fields, field types,
or copy field rules;
+upgrading Solr; and some system configuration properties.
+
+It's important to be aware that many changes require reindexing, because there are times
when not reindexing
+can have negative consequences for Solr as a system, or for the ability of your users to
find what they are looking for.
+
+There is no process in Solr for programmatically reindexing data. When we say "reindex",
we mean, literally,
+"index it again". However you got the data into the index the first time, you will run that
process again.
+It is strongly recommended that Solr users index their data in a repeatable, consistent way,
so that the process can be
+easily repeated when the need for reindexing arises.
+
+Reindexing is recommended during major upgrades, so in addition to covering what types of
configuration changes should trigger a reindex, this section will also cover strategies for
reindexing.
+
+== Changes that Require Reindex
+
+=== Schema Changes
+
+All changes to a collection's schema require reindexing. This is because many of the available
options are only
+applied during the indexing process. Solr simply has no way to implement the desired change
without reindexing
+the data.
+
+To understand the general reason why reindexing is ever required, it's helpful to understand
the relationship between
+Solr's schema and the underlying Lucene index. Lucene does not use a schema, it is a Solr-only
concept. When you delete
+a field from Solr's schema, it does not modify Lucene's index in any way. When you add a
field to Solr's schema, the
+field does not exist in Lucene's index until a document that contains the field is indexed.
+
+This means that there are many types of schema changes that cannot be reflected in the index
simply by modifying
+Solr's schema. This is different from most database models where schemas are used. With regard
to indexing, Solr's
+schema acts like a rulebook for indexing documents by telling Lucene how to interpret the
data being sent. Once the
+documents are in Lucene, Solr's schema has no control over the underlying data structure.
+
+In addition to the types of schema changes described in the following sections, changing
the schema `version` property
+is equivalent to changing field type properties. This type of change is usually only made
during or because of a major upgrade.
+
+==== Adding or Deleting Fields
+
+If you add or delete a field from Solr's schema, it's strongly recommended to reindex.
+
+When you add a field, you generally do so with the intent to use the field in some way.
+Since documents were indexed before the field was added, the index will not hold any references
to the field for earlier documents.
+If you want to use the new field for faceting, for example, the new field facet will not
include any documents that were not indexed with the new field.
+
+There is a slightly different situation when deleting a field.
+In this case, since simply removing the field from the schema doesn't change anything about
the index, the field will still be in the index until the documents are reindexed.
+In fact, Lucene may keep a reference to a deleted field _forever_ (see also https://issues.apache.org/jira/browse/LUCENE-1761[LUCENE-1761]).
+This may only be an issue for your environment if you try to add a field that has the same
name as a deleted field,
+but it can also be an issue for dynamic field rules that are later removed.
+
+==== Changing Field and Field Type Field Properties
+
+Solr has two ways of defining field properties.
+
+The first is to define properties on a field type. These properties are then applied to all
fields of that type unless they are explicitly overriden.
+
+The second is an override to a property inherited from the field type defined on the field
itself.
+
+If a property has been defined for a field type but the property is not overridden by defining
a different value for the
+property for a field, then changing the property on the field type is equivalent to changing
it on the field itself.
+
+Changes to *any* field/field type property described in <<field-type-definitions-and-properties.adoc#field-type-properties,Field
Type Properties>> require reindexing in order for the change to be reflected in all
documents.
+The list of changes that require reindexing includes:
+
+* Changing a field from stored to not stored, and vice versa.
+* Changing a field from indexed to not indexed, and vice versa.
+* Changing a field from multi-valued to single-valued, and vice versa.
+* <<Changing Field Analysis>>
+
+Be sure to reference the Field Type Properties section linked above for the complete list
of properties that would require a reindex.
+
+In some cases, it can be possible to change a field/field type property value and it will
only apply to documents
+indexed _after_ the change. This is not recommended to ensure consistent behavior, but may
be acceptable for your
+use case as a temporary condition before a full reindexing can be scheduled.
+
+==== Changing Field Analysis
+
+Beyond specific field-level properties, <<analyzers.adoc#analyzers,analysis chains>>
are also configured on field types, and are applied at index and/or query time.
+
+It's possible to define separate analysis chains for indexing and query events, or you can
define a single chain
+that is applied to both event types.
+
+If you change the analysis chain that applies to indexing events, it is strongly recommended
that you reindex.
+This is because all of the changes that occur due to the chain configuration are applied
to documents as they are
+being indexed, and only reindexing will allow your changes to take effect on documents.
+
+While reindexing after analyzer changes is not required, be aware that not reindexing can
cause unexpected
+query results in many cases.
+
+For example, if you indexed a number of documents and then decide you'd like to use the `LowerCaseTokenizerFactory`
+to ensure all text is converted to lower case, you will have a mix of entries in the field:
some in their original
+case ("iPhone"), and newer documents in all lower-case ("iphone"). If you do not reindex
the original set of documents,
+a query such as "iphone" will not match documents with "iPhone", because the schema rules
enforce lower case on the
+query, but that's not what is in the index.
+
+The only time you do not have to reindex when changing a field type's analysis chain is when
the changes impact
+queries *only* (and you know that you do not need to make corresponding changes to the index
analysis).
+
+=== Solrconfig Changes
+
+Only one parameter change to Solr's `solrconfig.xml` requires reindexing. That parameter
is the `luceneMatchVersion`,
+which controls the compatibility of Solr with Lucene changes. Since this parameter can change
the rules for analysis behind the scenes, it's always recommended to reindex when changing
this value. Usually, however, this is only changed in conjunction with a major upgrade.
+
+However, if you make a change to Solr's <<update-request-processors.adoc#update-request-processors,Update
Request Processors>>, it's generally because you want to change something about how
_update requests_ (documents) are _processed_ (indexed). In this case, you can decide based
on the change if you want to reindex your documents to implement the changes you've made.
+
+Similarly, if you change the `codecFactory` parameter in `solrconfig.xml`, it is again strongly
recommended that you
+plan to reindex your documents to avoid unintended behavior.
+
+== Upgrades
+
+When upgrading between major versions (for example, from a 7.x release to 8.0 or 8.x), a
best practice
+is to always reindex your data.
+The reason for this is that subtle changes may occur in default field type definitions or
the underlying code.
+
+[NOTE]
+If you have *not* changed your schema as part of an upgrade from one minor release to another
(such as, from 7.x
+to a later 7.x release), you can often skip reindexing your documents.
+However, when upgrading to a major release, you should plan to reindex your documents because
of the likelihood of
+changes that break back-compatibility.
+
+== Reindexing Strategies
+
+There are a few approaches available to perform the reindex.
+
+The strategies described below ensure that the Lucene index is completely dropped so you
can recreate it to accommodate your changes.
+They allow you to recreate the Lucene index without having Lucene segments lingering with
stale data.
+
+=== Delete All Documents
+
+The best approach is to first delete everything from the index, and then index your data
again.
+You can delete all documents with a "delete-by-query", such as this:
+
+[source,bash]
+curl -X POST -H 'Content-Type: application/json' --data-binary '{"delete":{"query":"*:*"
}}' http://localhost:8983/solr/my_collection/update
+
+It's important to verify that *all* documents have been deleted, as that ensures the Lucene
index segments have been
+deleted as well.
+
+To verify that there are no segments in your index, look in the data directory and confirm
it is empty.
+Since the data directory can be customized, see the section <<datadir-and-directoryfactory-in-solrconfig.adoc#specifying-a-location-for-index-data-with-the-datadir-parameter,Specifying
a Location for Index Data with the dataDir Parameter>>
+for where to look to find the index files.
+
+Note you will need to verify the indexes have been removed in every shard and every replica
on every node of a cluster.
+
+Once the indexes have been cleared, you can start reindexing by re-running the original index
process.
+
+=== Index to Another Collection
+
+In cases where you cannot take a production collection offline to delete all the documents,
one option is to use Solr's <<collections-api.adoc#createalias,collection alias>>
feature.
+
+This option is only available for Solr installations running in SolrCloud mode.
+
+With this approach, you will index your documents into a newly created collection and once
everything is completed,
+create an alias for the collection and point your front-end at the collection alias. Queries
will be routed
+to the new collection seamlessly.
+
+Here is an example of creating an alias that points to a single collection:
+
+[source,bash]
+http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=myData&collections=newCollection
+
+Once the alias is in place and you are satisfied you no longer need the old data, you can
delete the old collection with the <<collections-api.adoc#delete,DELETE command>>
of the Collections API:
+
+[source,bash]
+http://localhost:8983/solr/admin/collections?action=DELETE&name=oldCollection
+
+== Changes that Do Not Require Reindex
+
+The types of changes that do not require or strongly indicate reindexing are changes that
do not impact the index.
+
+Creating or modifying request handlers, search components, and other elements of `solrconfig.xml`
don't require reindexing.
+
+Cluster and core management actions, such as adding nodes, replicas, or new cores, or splitting
shards, also don't require reindexing.
diff --git a/solr/solr-ref-guide/src/schema-api.adoc b/solr/solr-ref-guide/src/schema-api.adoc
index 173391f..68f865a 100644
--- a/solr/solr-ref-guide/src/schema-api.adoc
+++ b/solr/solr-ref-guide/src/schema-api.adoc
@@ -33,14 +33,16 @@ The reason that this is discouraged is because hand-edits of the schema
may be l
 
 The API allows two output modes for all calls: JSON or XML. When requesting the complete
schema, there is another output mode which is XML modeled after the managed-schema file itself,
which is in XML format.
 
-When modifying the schema with the API, a core reload will automatically occur in order for
the changes to be available immediately for documents indexed thereafter. Previously indexed
documents will *not* be automatically updated - they *must* be re-indexed if existing index
data uses schema elements that you changed.
+When modifying the schema with the API, a core reload will automatically occur in order for
the changes to be available immediately for documents indexed thereafter. Previously indexed
documents will *not* be automatically updated - they *must* be reindexed if existing index
data uses schema elements that you changed.
 
-.Re-index after schema modifications!
+.Reindex after schema modifications!
 [IMPORTANT]
 ====
-If you modify your schema, you will likely need to re-index all documents. If you do not,
you may lose access to documents, or not be able to interpret them properly, e.g., after replacing
a field type.
+If you modify your schema, you will likely need to reindex all documents. If you do not,
you may lose access to documents, or not be able to interpret them properly, e.g., after replacing
a field type.
 
-Modifying your schema will never modify any documents that are already indexed. You must
re-index documents in order to apply schema changes to them. Queries and updates made after
the change may encounter errors that were not present before the change. Completely deleting
the index and rebuilding it is usually the only option to fix such errors.
+Modifying your schema will never modify any documents that are already indexed. You must
reindex documents in order to apply schema changes to them. Queries and updates made after
the change may encounter errors that were not present before the change. Completely deleting
the index and rebuilding it is usually the only option to fix such errors.
+
+See the section <<reindexing.adoc#reindexing,Reindexing>> for more information
about reindexing.
 ====
 
 ////
@@ -89,7 +91,7 @@ These commands can be issued in separate POST requests or in the same POST
reque
 
 In each case, the response will include the status and the time to process the request, but
will not include the entire schema.
 
-When modifying the schema with the API, a core reload will automatically occur in order for
the changes to be available immediately for documents indexed thereafter. Previously indexed
documents will *not* be automatically handled - they *must* be re-indexed if they used schema
elements that you changed.
+When modifying the schema with the API, a core reload will automatically occur in order for
the changes to be available immediately for documents indexed thereafter. Previously indexed
documents will *not* be automatically handled - they *must* be reindexed if they used schema
elements that you changed.
 
 === Add a New Field
 
diff --git a/solr/solr-ref-guide/src/shards-and-indexing-data-in-solrcloud.adoc b/solr/solr-ref-guide/src/shards-and-indexing-data-in-solrcloud.adoc
index b899c5f..a3c7cb8 100644
--- a/solr/solr-ref-guide/src/shards-and-indexing-data-in-solrcloud.adoc
+++ b/solr/solr-ref-guide/src/shards-and-indexing-data-in-solrcloud.adoc
@@ -112,7 +112,7 @@ If you created the collection and defined the "implicit" router at the
time of c
 
 == Shard Splitting
 
-When you create a collection in SolrCloud, you decide on the initial number shards to be
used. But it can be difficult to know in advance the number of shards that you need, particularly
when organizational requirements can change at a moment's notice, and the cost of finding
out later that you chose wrong can be high, involving creating new cores and re-indexing all
of your data.
+When you create a collection in SolrCloud, you decide on the initial number shards to be
used. But it can be difficult to know in advance the number of shards that you need, particularly
when organizational requirements can change at a moment's notice, and the cost of finding
out later that you chose wrong can be high, involving creating new cores and reindexing all
of your data.
 
 The ability to split shards is in the Collections API. It currently allows splitting a shard
into two pieces. The existing shard is left as-is, so the split action effectively makes two
copies of the data as new shards. You can delete the old shard at a later time when you're
ready.
 
diff --git a/solr/solr-ref-guide/src/solr-tutorial.adoc b/solr/solr-ref-guide/src/solr-tutorial.adoc
index 5ce22ac..f9a4ef2 100644
--- a/solr/solr-ref-guide/src/solr-tutorial.adoc
+++ b/solr/solr-ref-guide/src/solr-tutorial.adoc
@@ -585,7 +585,7 @@ First, we are using a "managed schema", which is configured to only be
modified
 
 Second, we are using "field guessing", which is configured in the `solrconfig.xml` file (and
includes most of Solr's various configuration settings). Field guessing is designed to allow
us to start using Solr without having to define all the fields we think will be in our documents
before trying to index them. This is why we call it "schemaless", because you can start quickly
and let Solr create fields for you as it encounters them in documents.
 
-Sounds great! Well, not really, there are limitations. It's a bit brute force, and if it
guesses wrong, you can't change much about a field after data has been indexed without having
to re-index. If we only have a few thousand documents that might not be bad, but if you have
millions and millions of documents, or, worse, don't have access to the original data anymore,
this can be a real problem.
+Sounds great! Well, not really, there are limitations. It's a bit brute force, and if it
guesses wrong, you can't change much about a field after data has been indexed without having
to reindex. If we only have a few thousand documents that might not be bad, but if you have
millions and millions of documents, or, worse, don't have access to the original data anymore,
this can be a real problem.
 
 For these reasons, the Solr community does not recommend going to production without a schema
that you have defined yourself. By this we mean that the schemaless features are fine to start
with, but you should still always make sure your schema matches your expectations for how
you want your data indexed and how users are going to query it.
 
@@ -936,7 +936,7 @@ Go ahead and edit any of the existing example data files, change some
of the dat
 
 === Deleting Data
 
-If you need to iterate a few times to get your schema right, you may want to delete documents
to clear out the collection and try again. Note, however, that merely removing documents doesn't
change the underlying field definitions. Essentially, this will allow you to re-index your
data after making changes to fields for your needs.
+If you need to iterate a few times to get your schema right, you may want to delete documents
to clear out the collection and try again. Note, however, that merely removing documents doesn't
change the underlying field definitions. Essentially, this will allow you to reindex your
data after making changes to fields for your needs.
 
 You can delete data by POSTing a delete command to the update URL and specifying the value
of the document's unique key field, or a query that matches multiple documents (be careful
with that one!). We can use `bin/post` to delete documents also if we structure the request
properly.
 
@@ -960,7 +960,7 @@ Jump ahead to the overall <<Wrapping Up,wrap up>> when you're
ready to stop Solr
 
 Solr has sophisticated geospatial support, including searching within a specified distance
range of a given location (or within a bounding box), sorting by distance, or even boosting
results by the distance.
 
-Some of the example techproducts documents we indexed in Exercise 1 have locations associated
with them to illustrate the spatial capabilities. To re-index this data, see <<index-the-techproducts-data,Exercise
1>>.
+Some of the example techproducts documents we indexed in Exercise 1 have locations associated
with them to illustrate the spatial capabilities. To reindex this data, see <<index-the-techproducts-data,Exercise
1>>.
 
 Spatial queries can be combined with any other types of queries, such as in this example
of querying for "ipod" within 10 kilometers from San Francisco:
 
diff --git a/solr/solr-ref-guide/src/updating-parts-of-documents.adoc b/solr/solr-ref-guide/src/updating-parts-of-documents.adoc
index 2d15410..acaedd9 100644
--- a/solr/solr-ref-guide/src/updating-parts-of-documents.adoc
+++ b/solr/solr-ref-guide/src/updating-parts-of-documents.adoc
@@ -18,7 +18,7 @@
 
 Once you have indexed the content you need in your Solr index, you will want to start thinking
about your strategy for dealing with changes to those documents. Solr supports three approaches
to updating documents that have only partially changed.
 
-The first is _<<Atomic Updates,atomic updates>>_. This approach allows changing
only one or more fields of a document without having to re-index the entire document.
+The first is _<<Atomic Updates,atomic updates>>_. This approach allows changing
only one or more fields of a document without having to reindex the entire document.
 
 The second approach is known as _<<In-Place Updates,in-place updates>>_. This
approach is similar to atomic updates (is a subset of atomic updates in some sense), but can
be used only for updating single valued non-indexed and non-stored docValue-based numeric
fields.
 
@@ -105,7 +105,7 @@ The resulting document in our collection will be:
 
 == In-Place Updates
 
-In-place updates are very similar to atomic updates; in some sense, this is a subset of atomic
updates. In regular atomic updates, the entire document is re-indexed internally during the
application of the update. However, in this approach, only the fields to be updated are affected
and the rest of the documents are not re-indexed internally. Hence, the efficiency of updating
in-place is unaffected by the size of the documents that are updated (i.e., number of fields,
size of fields, etc [...]
+In-place updates are very similar to atomic updates; in some sense, this is a subset of atomic
updates. In regular atomic updates, the entire document is reindexed internally during the
application of the update. However, in this approach, only the fields to be updated are affected
and the rest of the documents are not reindexed internally. Hence, the efficiency of updating
in-place is unaffected by the size of the documents that are updated (i.e., number of fields,
size of fields, etc.) [...]
 
 An atomic update operation is performed using this approach only when the fields to be updated
meet these three conditions:
 


Mime
View raw message