lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Defining a phonetic analyzer and searcher via the schema API
Date Mon, 12 Mar 2018 17:14:54 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Erick,

On 3/12/18 1:00 PM, Erick Erickson wrote:
> bq: which you aren't supposed to edit directly.
> 
> Well, kind of. Here's why it's "discouraged": 
> https://lucene.apache.org/solr/guide/6_6/schema-api.html.
> 
> But as long as you don't mix-and-match hand-editing with using the 
> schema API you can hand edit it freely. You're then in charge of 
> pushing it to ZK and reloading your collections that use it
> yourself however.

No Zookeeper (yet), but I suspect I'll end up there. I'm mostly
toying-around with it right now, but it won't be long before I'll want
to go live with it and having a single Solr instance isn't going to
help me sleep well at night. I'm sure I'll end up with two instances
to begin with, which requires ZK, right?

> As a side note, even if I _never_ hand-edited it I'd make it a 
> practice to regularly pull it from ZK and put it in some VCS system
> ;)

Actually, I have the script that builds the schema in VCS, so it's
roughly the same.

As for the schema modifications... did I get those right?

Thanks,
- -chris

> On Mon, Mar 12, 2018 at 9:51 AM, Christopher Schultz 
> <chris@christopherschultz.net> wrote: All,
> 
> I'd like to add a new synthesized field that uses a phonetic
> analyzer such as Beider-Morse. I'm using Solr 7.2.
> 
> When I request the current schema via the schema API, I get a list
> of existing fields, dynamic fields, and analyzers, none of which
> appear to be what I'm looking for.
> 
> Conceptually, I think I'd like to do something like this:
> 
> add-field: { name: phoneticname, type: phonetic, multiValued: true
> }
> 
> ... but how do I define what type of data "phonetic" should be?
> 
> I can see the example XML definition in this document: 
> https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filt
er
>
> 
Descriptions-Beider-MorseFilter
> 
> But I'm not sure how to add an analyzer to the schema using the
> schema API:
> https://lucene.apache.org/solr/guide/7_2/schema-api.html
> 
> Under "Add a new field type", it says that new analyzers can be 
> defined, but I'm not entirely sure how to do that ... the API docs 
> refer to the field type definitions page[1] which just shows what
> XML you'd have to put into your schema XML -- which you aren't
> supposed to edit directly.
> 
> When looking at the JSON version of my schema, I can see for
> example thi s:
> 
> "fieldTypes":[{ "name":"ancestor_path", "class":"solr.TextField", 
> "indexAnalyzer":{ "tokenizer":{ 
> "class":"solr.KeywordTokenizerFactory"}}, "queryAnalyzer":{ 
> "tokenizer":{ "class":"solr.PathHierarchyTokenizerFactory", 
> "delimiter":"/"}}},
> 
> So should I create a new field type like this?
> 
> "add-field-type" : { "name" : "phonetic", "class" :
> "solr.TextField",
> 
> "analyzer" : { "tokenizer": { "class" :
> "solr.StandardTokenizerFactory" },
> 
> "filters" : [{ "class": "solr.BeiderMorseFilterFactory", 
> "nameType": "GENERIC", "ruleType": "APPROX", "concat": "true", 
> "languageSet": "auto" }] } }
> 
> Then, use copy-field as "usual":
> 
> "add-field":{ "name":"phonetic", "type":"phonetic", multiValued:
> true, "stored":false },
> 
> "add-copy-field":{ "source":"first_name", "dest":"phonetic" },
> 
> "add-copy-field":{ "source":"last_name", "dest":"phonetic" },
> 
> This seems to work but I wanted to know if I was doing it the right
> way.
> 
> Thanks, -chris
> 
> [1] 
> https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-pr
op
>
> 
erties.html#field-type-definitions-and-properties
> 
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmtY4dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhdIA/9GkZ/yimVmkwB725L
uS4kcy4YJowyYw+eMtvurpIq/ZV/U8H4hFJY/ddsT+bdrjeZMsTdc7B9Tdlha8xt
dmuj1VcvDn3uyIUGooTOob6ZvZwjeJEZIJrbwUM5gNq7uJW8xpCU0/3+iP6Km7OY
1Nia5uCuwarLWcsRFdtjCvR3M7ZppBYHec3kVGGOUL637AC6ISgpxhuzOnuTHAss
wCjuR1y6AdTjRbHpis3MJdiVIjEENfyzGpEnqvumsu1e+0F/A0DNbhU9nAPv+73d
aOLfOW9Fs6jjnq96qzIBAkHLWkqU1GHKYNYHql7/59x8rFcjGkGC7ziSY69lKc+f
ivrIEqLH1Go7kawz+1og3dPyl/n0CFWE3UK+wj5QeTY5XLduq0x6EmFKW6D790BS
ywmFuqr4cmvKbs3N6BbxHz5QVbjgRsWO4jp4kJi3KDCepd8vKW+2xwHfX/zAcBKY
rSDuVkM3KtxQal8xgm4tsvyU3g1dXpNEVa7PFXYJzd3uA2yij9OU6s83NS9LHK3N
2zssPfNDj7QddAEhYan0O4r4wSUN2UNT9nMhBVXXYRpoD6WzrhC5TdRUDh66rkOB
AvhAUKsV0rfjct+MUBpQA9W+SUG7i911wNSBJJmB58MYbyxMAJb8NKGk1yEs1MyH
FQHEgiEEFRCD9ZFd/fqwfuPyKQo=
=Vqz6
-----END PGP SIGNATURE-----

Mime
View raw message