lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ctarg...@apache.org
Subject [26/37] lucene-solr:branch_6x: squash merge jira/solr-10290 into master
Date Fri, 12 May 2017 14:05:34 GMT
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/field-properties-by-use-case.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/field-properties-by-use-case.adoc b/solr/solr-ref-guide/src/field-properties-by-use-case.adoc
new file mode 100644
index 0000000..bd014b9
--- /dev/null
+++ b/solr/solr-ref-guide/src/field-properties-by-use-case.adoc
@@ -0,0 +1,34 @@
+= Field Properties by Use Case
+:page-shortname: field-properties-by-use-case
+:page-permalink: field-properties-by-use-case.html
+
+Here is a summary of common use cases, and the attributes the fields or field types should
have to support the case. An entry of true or false in the table indicates that the option
must be set to the given value for the use case to function correctly. If no entry is provided,
the setting of that attribute has no impact on the case.
+
+// NOTE: not currently using footnoteref here because:
+//  - it has issues with tables in the PDF
+//  - citing the same footnote with multiple refs causes it to generate invalid HTML (dup
ids)
+
+[width="100%",cols="16%,12%,12%,12%,12%,12%,12%,12%",options="header",]
+|===
+|Use Case |indexed |stored |multiValued |omitNorms |termVectors |termPositions |docValues
+|search within field |true | | | | | |
+|retrieve contents | |true^<<fpbuc_8,8>>^ | | | | |true^<<fpbuc_8,8>>^
+|use as unique key |true | |false | | | |
+|sort on field |true^<<fpbuc_7,7>>^ | |false |true ^<<fpbuc_1,1>>^
| | |true^<<fpbuc_7,7>>^
+|highlighting |true^<<fpbuc_4,4>>^ |true | | |true^<<fpbuc_2,2>>^
|true ^<<fpbuc_3,3>>^ |
+|faceting ^<<fpbuc_5,5>>^ |true^<<fpbuc_7,7>>^ | | | | | |true^<<fpbuc_7,7>>^
+|add multiple values, maintaining order | | |true | | | |
+|field length affects doc score | | | |false | | |
+|MoreLikeThis ^<<fpbuc_5,5>>^ | | | | |true ^<<fpbuc_6,6>>^ | |
+|===
+
+Notes:
+
+1. [[fpbuc_1,1]] Recommended but not necessary.
+2. [[fpbuc_2,2]] Will be used if present, but not necessary.
+3. [[fpbuc_3,3]] (if termVectors=true)
+4. [[fpbuc_4,4]] A tokenizer must be defined for the field, but it doesn't need to be indexed.
+5. [[fpbuc_5,5]] Described in <<understanding-analyzers-tokenizers-and-filters.adoc#understanding-analyzers-tokenizers-and-filters,Understanding
Analyzers, Tokenizers, and Filters>>.
+6. [[fpbuc_6,6]] Term vectors are not mandatory here. If not true, then a stored field is
analyzed. So term vectors are recommended, but only required if `stored=false`.
+7. [[fpbuc_7,7]] For most field types, either `indexed` or `docValues` must be true, but
both are not required. <<docvalues.adoc#docvalues,DocValues>> can be more efficient
in many cases. For `[Int/Long/Float/Double/Date]PointFields`, `docValues=true` is required.
+8. [[fpbuc_8,8]] Stored content will be used by default, but docValues can alternatively
be used. See <<docvalues.adoc#docvalues,DocValues>>.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc b/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc
new file mode 100644
index 0000000..df06657
--- /dev/null
+++ b/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc
@@ -0,0 +1,115 @@
+= Field Type Definitions and Properties
+:page-shortname: field-type-definitions-and-properties
+:page-permalink: field-type-definitions-and-properties.html
+
+A field type defines the analysis that will occur on a field when documents are indexed or
queries are sent to the index.
+
+A field type definition can include four types of information:
+
+* The name of the field type (mandatory).
+* An implementation class name (mandatory).
+* If the field type is `TextField`, a description of the field analysis for the field type.
+* Field type properties - depending on the implementation class, some properties may be mandatory.
+
+[[FieldTypeDefinitionsandProperties-FieldTypeDefinitionsinschema.xml]]
+== Field Type Definitions in `schema.xml`
+
+Field types are defined in `schema.xml`. Each field type is defined between `fieldType` elements.
They can optionally be grouped within a `types` element. Here is an example of a field type
definition for a type called `text_general`:
+
+[source,xml,subs="verbatim,callouts"]
+----
+<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> --<1>
+  <analyzer type="index"> --<2>
+    <tokenizer class="solr.StandardTokenizerFactory"/>
+    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
+    <!-- in this example, we will only use synonyms at query time
+    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"
expand="false"/>
+    -->
+    <filter class="solr.LowerCaseFilterFactory"/>
+  </analyzer>
+  <analyzer type="query">
+    <tokenizer class="solr.StandardTokenizerFactory"/>
+    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
+    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
+    <filter class="solr.LowerCaseFilterFactory"/>
+  </analyzer>
+</fieldType>
+----
+
+<1> The first line in the example above contains the field type name, `text_general`,
and the name of the implementing class, `solr.TextField`.
+<2> The rest of the definition is about field analysis, described in <<understanding-analyzers-tokenizers-and-filters.adoc#understanding-analyzers-tokenizers-and-filters,Understanding
Analyzers, Tokenizers, and Filters>>.
+
+The implementing class is responsible for making sure the field is handled correctly. In
the class names in `schema.xml`, the string `solr` is shorthand for `org.apache.solr.schema`
or `org.apache.solr.analysis`. Therefore, `solr.TextField` is really `org.apache.solr.schema.TextField`.
+
+== Field Type Properties
+
+The field type `class` determines most of the behavior of a field type, but optional properties
can also be defined. For example, the following definition of a date field type defines two
properties, `sortMissingLast` and `omitNorms`.
+
+[source,xml]
+----
+<fieldType name="date" class="solr.TrieDateField"
+           sortMissingLast="true" omitNorms="true"/>
+----
+
+The properties that can be specified for a given field type fall into three major categories:
+
+* Properties specific to the field type's class.
+* <<General Properties>> Solr supports for any field type.
+* <<Field Default Properties>> that can be specified on the field type that will
be inherited by fields that use this type instead of the default behavior.
+
+=== General Properties
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599
is fixed
+
+[cols="30,40,30",options="header"]
+|===
+|Property |Description |Values
+|name |The name of the fieldType. This value gets used in field definitions, in the "type"
attribute. It is strongly recommended that names consist of alphanumeric or underscore characters
only and not start with a digit. This is not currently strictly enforced. |
+|class |The class name that gets used to store and index the data for this type. Note that
you may prefix included class names with "solr." and Solr will automatically figure out which
packages to search for the class - so `solr.TextField` will work. If you are using a third-party
class, you will probably need to have a fully qualified class name. The fully qualified equivalent
for `solr.TextField` is `org.apache.solr.schema.TextField`. |
+|positionIncrementGap |For multivalued fields, specifies a distance between multiple values,
which prevents spurious phrase matches |integer
+|autoGeneratePhraseQueries |For text fields. If true, Solr automatically generates phrase
queries for adjacent terms. If false, terms must be enclosed in double-quotes to be treated
as phrases. |true or false
+|enableGraphQueries |For text fields, applicable when querying with <<the-standard-query-parser.adoc#TheStandardQueryParser-StandardQueryParserParameters,`sow=false`>>.
Use `true` (the default) for field types with query analyzers including graph-aware filters,
e.g. <<filter-descriptions.adoc#FilterDescriptions-SynonymGraphFilter,Synonym Graph
Filter>> and <<filter-descriptions.adoc#FilterDescriptions-WordDelimiterGraphFilter,Word
Delimiter Graph Filter>>. Use `false` for field types with query analyzers including
filters that can match docs when some tokens are missing, e.g., <<filter-descriptions.adoc#FilterDescriptions-ShingleFilter,Shingle
Filter>>. |true or false
+|[[FieldTypeDefinitionsandProperties-docValuesFormat]]docValuesFormat |Defines a custom `DocValuesFormat`
to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory`
has been configured in solrconfig.xml. |n/a
+|postingsFormat |Defines a custom `PostingsFormat` to use for fields of this type. This requires
that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in solrconfig.xml.
|n/a
+|===
+
+[NOTE]
+====
+Lucene index back-compatibility is only supported for the default codec. If you choose to
customize the `postingsFormat` or `docValuesFormat` in your schema.xml, upgrading to a future
version of Solr may require you to either switch back to the default codec and optimize your
index to rewrite it into the default codec before upgrading, or re-build your entire index
from scratch after upgrading.
+====
+
+=== Field Default Properties
+
+These are properties that can be specified either on the field types, or on individual fields
to override the values provided by the field types.
+
+The default values for each property depend on the underlying `FieldType` class, which in
turn may depend on the `version` attribute of the `<schema/>`. The table below includes
the default value for most `FieldType` implementations provided by Solr, assuming a `schema.xml`
that declares `version="1.6"`.
+
+// TODO: SOLR-10655 BEGIN: refactor this into a 'field-default-properties.include.adoc' file
for reuse
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599
is fixed
+
+[cols="20,40,20,20",options="header"]
+|===
+|Property |Description |Values |Implicit Default
+|indexed |If true, the value of the field can be used in queries to retrieve matching documents.
|true or false |true
+|stored |If true, the actual value of the field can be retrieved by queries. |true or false
|true
+|docValues |If true, the value of the field will be put in a column-oriented <<docvalues.adoc#docvalues,DocValues>>
structure. |true or false |false
+|sortMissingFirst sortMissingLast |Control the placement of documents when a sort field is
not present. |true or false |false
+|multiValued |If true, indicates that a single document might contain multiple values for
this field type. |true or false |false
+|omitNorms |If true, omits the norms associated with this field (this disables length normalization
for the field, and saves some memory). *Defaults to true for all primitive (non-analyzed)
field types, such as int, float, data, bool, and string.* Only full-text fields or fields
need norms. |true or false |*
+|omitTermFreqAndPositions |If true, omits term frequency, positions, and payloads from postings
for this field. This can be a performance boost for fields that don't require that information.
It also reduces the storage space required for the index. Queries that rely on position that
are issued on a field with this option will silently fail to find documents. *This property
defaults to true for all field types that are not text fields.* |true or false |*
+|omitPositions |Similar to `omitTermFreqAndPositions` but preserves term frequency information.
|true or false |*
+|termVectors termPositions termOffsets termPayloads |These options instruct Solr to maintain
full term vectors for each document, optionally including position, offset and payload information
for each term occurrence in those vectors. These can be used to accelerate highlighting and
other ancillary functionality, but impose a substantial cost in terms of index size. They
are not necessary for typical uses of Solr. |true or false |false
+|required |Instructs Solr to reject any attempts to add a document which does not have a
value for this field. This property defaults to false. |true or false |false
+|useDocValuesAsStored |If the field has <<docvalues.adoc#docvalues,docValues>>
enabled, setting this to true would allow the field to be returned as if it were a stored
field (even if it has `stored=false`) when matching "`*`" in an <<common-query-parameters.adoc#CommonQueryParameters-Thefl_FieldList_Parameter,fl
parameter>>. |true or false |true
+|large |Large fields are always lazy loaded and will only take up space in the document cache
if the actual value is < 512KB. This option requires `stored="true"` and `multiValued="false"`.
It's intended for fields that might have very large values so that they don't get cached in
memory. |true or false |false
+|===
+
+// TODO: SOLR-10655 END
+
+[[FieldTypeDefinitionsandProperties-FieldTypeSimilarity]]
+== Field Type Similarity
+
+A field type may optionally specify a `<similarity/>` that will be used when scoring
documents that refer to fields with this type, as long as the "global" similarity for the
collection allows it.
+
+By default, any field type which does not define a similarity, uses `BM25Similarity`. For
more details, and examples of configuring both global & per-type Similarities, please
see <<other-schema-elements.adoc#OtherSchemaElements-Similarity,Other Schema Elements>>.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/field-types-included-with-solr.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/field-types-included-with-solr.adoc b/solr/solr-ref-guide/src/field-types-included-with-solr.adoc
new file mode 100644
index 0000000..8aaef00
--- /dev/null
+++ b/solr/solr-ref-guide/src/field-types-included-with-solr.adoc
@@ -0,0 +1,40 @@
+= Field Types Included with Solr
+:page-shortname: field-types-included-with-solr
+:page-permalink: field-types-included-with-solr.html
+
+The following table lists the field types that are available in Solr. The `org.apache.solr.schema`
package includes all the classes listed in this table.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599
is fixed
+
+[cols="25,75",options="header"]
+|===
+|Class |Description
+|BinaryField |Binary data.
+|BoolField |Contains either true or false. Values of "1", "t", or "T" in the first character
are interpreted as true. Any other values in the first character are interpreted as false.
+|CollationField |Supports Unicode collation for sorting and range queries. ICUCollationField
is a better choice if you can use ICU4J. See the section <<language-analysis.adoc#LanguageAnalysis-UnicodeCollation,Unicode
Collation>>.
+|CurrencyField |Supports currencies and exchange rates. See the section <<working-with-currencies-and-exchange-rates.adoc#working-with-currencies-and-exchange-rates,Working
with Currencies and Exchange Rates>>.
+|DateRangeField |Supports indexing date ranges, to include point in time date instances as
well (single-millisecond durations). See the section <<working-with-dates.adoc#working-with-dates,Working
with Dates>> for more detail on using this field type. Consider using this field type
even if it's just for date instances, particularly when the queries typically fall on UTC
year/month/day/hour, etc., boundaries.
+|ExternalFileField |Pulls values from a file on disk. See the section <<working-with-external-files-and-processes.adoc#working-with-external-files-and-processes,Working
with External Files and Processes>>.
+|EnumField |Allows defining an enumerated set of values which may not be easily sorted by
either alphabetic or numeric order (such as a list of severities, for example). This field
type takes a configuration file, which lists the proper order of the field values. See the
section <<working-with-enum-fields.adoc#working-with-enum-fields,Working with Enum Fields>>
for more information.
+|ICUCollationField |Supports Unicode collation for sorting and range queries. See the section
<<language-analysis.adoc#LanguageAnalysis-UnicodeCollation,Unicode Collation>>.
+|LatLonPointSpatialField |<<spatial-search.adoc#spatial-search,Spatial Search>>:
a latitude/longitude coordinate pair; possibly multi-valued for multiple points. Usually it's
specified as "lat,lon" order with a comma.
+|LatLonType |(deprecated) <<spatial-search.adoc#spatial-search,Spatial Search>>:
a single-valued latitude/longitude coordinate pair. Usually it's specified as "lat,lon" order
with a comma.
+|PointType |<<spatial-search.adoc#spatial-search,Spatial Search>>: A single-valued
n-dimensional point. It's both for sorting spatial data that is _not_ lat-lon, and for some
more rare use-cases. (NOTE: this is _not_ related to the "Point" based numeric fields)
+|PreAnalyzedField |Provides a way to send to Solr serialized token streams, optionally with
independent stored values of a field, and have this information stored and indexed without
any additional text processing. Configuration and usage of PreAnalyzedField is documented
on the <<working-with-external-files-and-processes.adoc#WorkingwithExternalFilesandProcesses-ThePreAnalyzedFieldType,Working
with External Files and Processes>> page.
+|RandomSortField |Does not contain a value. Queries that sort on this field type will return
results in random order. Use a dynamic field to use this feature.
+|SpatialRecursivePrefixTreeFieldType |(RPT for short) <<spatial-search.adoc#spatial-search,Spatial
Search>>: Accepts latitude comma longitude strings or other shapes in WKT format.
+|StrField |String (UTF-8 encoded string or Unicode). Strings are intended for small fields
and are _not_ tokenized or analyzed in any way. They have a hard limit of slightly less than
32K.
+|TextField |Text, usually multiple words or tokens.
+|TrieDateField |Date field. Represents a point in time with millisecond precision. See the
section <<working-with-dates.adoc#working-with-dates,Working with Dates>>. `precisionStep="0"`
minimizes index size; `precisionStep="8"` (the default) enables more efficient range queries.
For single valued fields, use `docValues="true"` for efficient sorting.
+|TrieDoubleField |Double field (64-bit IEEE floating point). `precisionStep="0"` minimizes
index size; `precisionStep="8"` (the default) enables more efficient range queries. For single
valued fields, use `docValues="true"` for efficient sorting.
+|TrieFloatField |Floating point field (32-bit IEEE floating point) . `precisionStep="0"`
enables efficient numeric sorting and minimizes index size; `precisionStep="8"` (the default)
enables efficient range queries. Use `docValues="true"` for efficient sorting. For single
valued fields, use `docValues="true"` for efficient sorting.
+|TrieIntField |Integer field (32-bit signed integer). `precisionStep="0"` enables efficient
numeric sorting and minimizes index size; `precisionStep="8"` (the default) enables efficient
range queries. For single valued fields, use `docValues="true"` for efficient sorting.
+|TrieLongField |Long field (64-bit signed integer). `precisionStep="0"` minimizes index size;
`precisionStep="8"` (the default) enables more efficient range queries. For single valued
fields, use `docValues="true"` for efficient sorting.
+|TrieField |If this field type is used, a "type" attribute must also be specified, valid
values are: `integer`, `long`, `float`, `double`, `date`. Using this field is the same as
using any of the Trie fields mentioned above
+|DatePointField |Date field. Represents a point in time with millisecond precision. See the
section <<working-with-dates.adoc#working-with-dates,Working with Dates>>. This
class functions similarly to TrieDateField, but using a "Dimensional Points" based data structure
instead of indexed terms, and doesn't require configuration of a precision step. For single
valued fields, `docValues="true"` must be used to enable sorting.
+|DoublePointField |Double field (64-bit IEEE floating point). This class functions similarly
to TrieDoubleField, but using a "Dimensional Points" based data structure instead of indexed
terms, and doesn't require configuration of a precision step. For single valued fields, `docValues="true"`
must be used to enable sorting.
+|FloatPointField |Floating point field (32-bit IEEE floating point). This class functions
similarly to TrieFloatField, but using a "Dimensional Points" based data structure instead
of indexed terms, and doesn't require configuration of a precision step. For single valued
fields, `docValues="true"` must be used to enable sorting.
+|IntPointField |Integer field (32-bit signed integer). This class functions similarly to
TrieIntField, but using a "Dimensional Points" based data structure instead of indexed terms,
and doesn't require configuration of a precision step. For single valued fields, `docValues="true"`
must be used to enable sorting.
+|LongPointField |Long field (64-bit signed integer). This class functions similarly to TrieLongField,
but using a "Dimensional Points" based data structure instead of indexed terms, and doesn't
require configuration of a precision step. For single valued fields, `docValues="true"` must
be used to enable sorting.
+|UUIDField |Universally Unique Identifier (UUID). Pass in a value of "NEW" and Solr will
create a new UUID. *Note*: configuring a UUIDField instance with a default value of "NEW"
is not advisable for most users when using SolrCloud (and not possible if the UUID value is
configured as the unique key field) since the result will be that each replica of each document
will get a unique UUID value. Using UUIDUpdateProcessorFactory to generate UUID values when
documents are added is recommended instead.
+|===

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/files-screen.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/files-screen.adoc b/solr/solr-ref-guide/src/files-screen.adoc
new file mode 100644
index 0000000..4d134e6
--- /dev/null
+++ b/solr/solr-ref-guide/src/files-screen.adoc
@@ -0,0 +1,23 @@
+= Files Screen
+:page-shortname: files-screen
+:page-permalink: files-screen.html
+
+The Files screen lets you browse & view the various configuration files (such `solrconfig.xml`
and the schema file) for the collection you selected.
+
+.The Files Screen
+image::images/files-screen/files-screen.png[image,height=400]
+
+If you are using <<solrcloud.adoc#solrcloud,SolrCloud>>, the files displayed
are the configuration files for this collection stored in ZooKeeper. In a standalone Solr
installations, all files in the `conf` directory are displayed.
+
+While `solrconfig.xml` defines the behavior of Solr as it indexes content and responds to
queries, the Schema allows you to define the types of data in your content (field types),
the fields your documents will be broken into, and any dynamic fields that should be generated
based on patterns of field names in the incoming documents. Any other configuration files
are used depending on how they are referenced in either `solrconfig.xml` or your schema.
+
+Configuration files cannot be edited with this screen, so a text editor of some kind must
be used.
+
+This screen is related to the <<schema-browser-screen.adoc#schema-browser-screen,Schema
Browser Screen>>, in that they both can display information from the schema, but the
Schema Browser provides a way to drill into the analysis chain and displays linkages between
field types, fields, and dynamic field rules.
+
+Many of the options defined in these configuration files are described throughout the rest
of this Guide. In particular, you will want to review these sections:
+
+* <<indexing-and-basic-data-operations.adoc#indexing-and-basic-data-operations,Indexing
and Basic Data Operations>>
+* <<searching.adoc#searching,Searching>>
+* <<the-well-configured-solr-instance.adoc#the-well-configured-solr-instance,The Well-Configured
Solr Instance>>
+* <<documents-fields-and-schema-design.adoc#documents-fields-and-schema-design,Documents,
Fields, and Schema Design>>


Mime
View raw message