lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Defining a phonetic analyzer and searcher via the schema API
Date Mon, 12 Mar 2018 17:14:24 GMT
People can discourage that, but we only use hand-edited schema and solrconfig files. Those
are checked into version control. I wrote some Python to load them into Zookeeper and reload
the cluster.

This allows us to use the same configs in dev, test, and prod. We can actually test things
before putting them in prod.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 12, 2018, at 10:00 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> bq: which you aren't supposed to edit directly.
> 
> Well, kind of. Here's why it's "discouraged":
> https://lucene.apache.org/solr/guide/6_6/schema-api.html.
> 
> But as long as you don't mix-and-match hand-editing with using the
> schema API you can hand edit it freely. You're then in charge of
> pushing it to ZK and reloading your collections that use it yourself
> however.
> 
> As a side note, even if I _never_ hand-edited it I'd make it a
> practice to regularly pull it from ZK and put it in some VCS system ;)
> 
> Best,
> Erick
> 
> On Mon, Mar 12, 2018 at 9:51 AM, Christopher Schultz
> <chris@christopherschultz.net> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>> 
>> All,
>> 
>> I'd like to add a new synthesized field that uses a phonetic analyzer
>> such as Beider-Morse. I'm using Solr 7.2.
>> 
>> When I request the current schema via the schema API, I get a list of
>> existing fields, dynamic fields, and analyzers, none of which appear
>> to be what I'm looking for.
>> 
>> Conceptually, I think I'd like to do something like this:
>> 
>> add-field: { name: phoneticname, type: phonetic, multiValued: true }
>> 
>> ... but how do I define what type of data "phonetic" should be?
>> 
>> I can see the example XML definition in this document:
>> https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filter
>> Descriptions-Beider-MorseFilter
>> 
>> But I'm not sure how to add an analyzer to the schema using the schema
>> API: https://lucene.apache.org/solr/guide/7_2/schema-api.html
>> 
>> Under "Add a new field type", it says that new analyzers can be
>> defined, but I'm not entirely sure how to do that ... the API docs
>> refer to the field type definitions page[1] which just shows what XML
>> you'd have to put into your schema XML -- which you aren't supposed to
>> edit directly.
>> 
>> When looking at the JSON version of my schema, I can see for example thi
>> s:
>> 
>>    "fieldTypes":[{
>>        "name":"ancestor_path",
>>        "class":"solr.TextField",
>>        "indexAnalyzer":{
>>          "tokenizer":{
>>            "class":"solr.KeywordTokenizerFactory"}},
>>        "queryAnalyzer":{
>>          "tokenizer":{
>>            "class":"solr.PathHierarchyTokenizerFactory",
>>            "delimiter":"/"}}},
>> 
>> So should I create a new field type like this?
>> 
>> "add-field-type" : {
>>  "name" : "phonetic",
>>  "class" : "solr.TextField",
>> 
>>  "analyzer" : {
>>    "tokenizer": { "class" : "solr.StandardTokenizerFactory" },
>> 
>>    "filters" : [{
>>      "class": "solr.BeiderMorseFilterFactory",
>>      "nameType": "GENERIC",
>>      "ruleType": "APPROX",
>>      "concat": "true",
>>      "languageSet": "auto"
>>    }]
>>  }
>> }
>> 
>> Then, use copy-field as "usual":
>> 
>>  "add-field":{
>>     "name":"phonetic",
>>     "type":"phonetic",
>>     multiValued: true,
>>     "stored":false },
>> 
>>  "add-copy-field":{
>>     "source":"first_name",
>>     "dest":"phonetic" },
>> 
>>  "add-copy-field":{
>>     "source":"last_name",
>>     "dest":"phonetic" },
>> 
>> This seems to work but I wanted to know if I was doing it the right way.
>> 
>> Thanks,
>> - -chris
>> 
>> [1]
>> https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-prop
>> erties.html#field-type-definitions-and-properties
>> -----BEGIN PGP SIGNATURE-----
>> Comment: GPGTools - http://gpgtools.org
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> 
>> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmsC4dHGNocmlzQGNo
>> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjZWRAAisee5Ya+5dyix91A
>> cGpwgZtFpcVldhd0wDG8qwihq9528vBZCdDSM3yotojMd+Y9dYLm+Q+oM/RT/zoO
>> IXVfRRc352GqG00++hYKpZONUp9Eb3RNjl64+TCufz7vSpr3U/TsJL4wwIMQAY3r
>> eItN/v6TWvvb6jd0z/zL1eITeheOm7bFGjZhGRNv2A7LaQbqTLs6N+SgYphUv7mr
>> E6oQZD5VsdNDqmQdpXVA+Z+eiHweST5JHm1T2ePPz2S7lYunmAcGkAhCmTn2Kwew
>> H3C8+h+mD14YlfYK5J0VcQ2WMZtOkgNNvBiUGIUoEGoqu82dX81408cS49/ZYD/3
>> c9/p41nfzz2V9M3HwgYqbQTI9vV5HP33t44BsWIQr34x86yAPfnMIH3Yv5iEfXTk
>> aGAyeQjkfmMfJbiKTtmVu8Z7q/AiacgzUFUh3yMzGnoDQKz/OWw0A3JkdJ0TT/vY
>> Y6ZiwarooO1tuhG+wm4h+6rUQpoueJS7K8cdWi7LfVb9LGLgj7NCaOQtyIn9QAmk
>> 1UxaJjIOiyO1hsV31nC0kXfKW2A/gkN444gitSi51106QuzIXpEtCeAc4QmqjJt9
>> yeI61DFbQRnr76oVCiyYQwEmOj+C0bOkZqkLU7ZvMonWLLjgX0ydrpNSfm0fDDNv
>> tdfbE/POTM+uJlgX0UEEJhN7qz0=
>> =bgGi
>> -----END PGP SIGNATURE-----


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message