lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Defining a phonetic analyzer and searcher via the schema API
Date Mon, 12 Mar 2018 17:00:01 GMT
bq: which you aren't supposed to edit directly.

Well, kind of. Here's why it's "discouraged":
https://lucene.apache.org/solr/guide/6_6/schema-api.html.

But as long as you don't mix-and-match hand-editing with using the
schema API you can hand edit it freely. You're then in charge of
pushing it to ZK and reloading your collections that use it yourself
however.

As a side note, even if I _never_ hand-edited it I'd make it a
practice to regularly pull it from ZK and put it in some VCS system ;)

Best,
Erick

On Mon, Mar 12, 2018 at 9:51 AM, Christopher Schultz
<chris@christopherschultz.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> All,
>
> I'd like to add a new synthesized field that uses a phonetic analyzer
> such as Beider-Morse. I'm using Solr 7.2.
>
> When I request the current schema via the schema API, I get a list of
> existing fields, dynamic fields, and analyzers, none of which appear
> to be what I'm looking for.
>
> Conceptually, I think I'd like to do something like this:
>
> add-field: { name: phoneticname, type: phonetic, multiValued: true }
>
> ... but how do I define what type of data "phonetic" should be?
>
> I can see the example XML definition in this document:
> https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filter
> Descriptions-Beider-MorseFilter
>
> But I'm not sure how to add an analyzer to the schema using the schema
> API: https://lucene.apache.org/solr/guide/7_2/schema-api.html
>
> Under "Add a new field type", it says that new analyzers can be
> defined, but I'm not entirely sure how to do that ... the API docs
> refer to the field type definitions page[1] which just shows what XML
> you'd have to put into your schema XML -- which you aren't supposed to
> edit directly.
>
> When looking at the JSON version of my schema, I can see for example thi
> s:
>
>     "fieldTypes":[{
>         "name":"ancestor_path",
>         "class":"solr.TextField",
>         "indexAnalyzer":{
>           "tokenizer":{
>             "class":"solr.KeywordTokenizerFactory"}},
>         "queryAnalyzer":{
>           "tokenizer":{
>             "class":"solr.PathHierarchyTokenizerFactory",
>             "delimiter":"/"}}},
>
> So should I create a new field type like this?
>
> "add-field-type" : {
>   "name" : "phonetic",
>   "class" : "solr.TextField",
>
>   "analyzer" : {
>     "tokenizer": { "class" : "solr.StandardTokenizerFactory" },
>
>     "filters" : [{
>       "class": "solr.BeiderMorseFilterFactory",
>       "nameType": "GENERIC",
>       "ruleType": "APPROX",
>       "concat": "true",
>       "languageSet": "auto"
>     }]
>   }
> }
>
> Then, use copy-field as "usual":
>
>   "add-field":{
>      "name":"phonetic",
>      "type":"phonetic",
>      multiValued: true,
>      "stored":false },
>
>   "add-copy-field":{
>      "source":"first_name",
>      "dest":"phonetic" },
>
>   "add-copy-field":{
>      "source":"last_name",
>      "dest":"phonetic" },
>
> This seems to work but I wanted to know if I was doing it the right way.
>
> Thanks,
> - -chris
>
> [1]
> https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-prop
> erties.html#field-type-definitions-and-properties
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmsC4dHGNocmlzQGNo
> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjZWRAAisee5Ya+5dyix91A
> cGpwgZtFpcVldhd0wDG8qwihq9528vBZCdDSM3yotojMd+Y9dYLm+Q+oM/RT/zoO
> IXVfRRc352GqG00++hYKpZONUp9Eb3RNjl64+TCufz7vSpr3U/TsJL4wwIMQAY3r
> eItN/v6TWvvb6jd0z/zL1eITeheOm7bFGjZhGRNv2A7LaQbqTLs6N+SgYphUv7mr
> E6oQZD5VsdNDqmQdpXVA+Z+eiHweST5JHm1T2ePPz2S7lYunmAcGkAhCmTn2Kwew
> H3C8+h+mD14YlfYK5J0VcQ2WMZtOkgNNvBiUGIUoEGoqu82dX81408cS49/ZYD/3
> c9/p41nfzz2V9M3HwgYqbQTI9vV5HP33t44BsWIQr34x86yAPfnMIH3Yv5iEfXTk
> aGAyeQjkfmMfJbiKTtmVu8Z7q/AiacgzUFUh3yMzGnoDQKz/OWw0A3JkdJ0TT/vY
> Y6ZiwarooO1tuhG+wm4h+6rUQpoueJS7K8cdWi7LfVb9LGLgj7NCaOQtyIn9QAmk
> 1UxaJjIOiyO1hsV31nC0kXfKW2A/gkN444gitSi51106QuzIXpEtCeAc4QmqjJt9
> yeI61DFbQRnr76oVCiyYQwEmOj+C0bOkZqkLU7ZvMonWLLjgX0ydrpNSfm0fDDNv
> tdfbE/POTM+uJlgX0UEEJhN7qz0=
> =bgGi
> -----END PGP SIGNATURE-----

Mime
View raw message