Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 76818 invoked from network); 7 Mar 2002 13:50:19 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 7 Mar 2002 13:50:19 -0000 Received: (qmail 22115 invoked by uid 97); 7 Mar 2002 13:50:12 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 22069 invoked by uid 97); 7 Mar 2002 13:50:11 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 22054 invoked from network); 7 Mar 2002 13:50:10 -0000 From: "Alex Murzaku" To: "'Lucene Users List'" Subject: RE: Support for russian morphology in Lucene Date: Thu, 7 Mar 2002 08:50:00 -0500 Message-ID: <000201c1c5de$fb2397f0$1f01000a@toronto> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0003_01C1C5B5.124F1690" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <001201c1c5a0$40bde2f0$0100a8c0@medusa> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N ------=_NextPart_000_0003_01C1C5B5.124F1690 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Real morphology (finding the root for all the forms of a word) in Russian might not be that easy since in Russian you have both prefixes (aspect) and suffixes (case, number, conjugation) that inflect a word. But, there are already efforts to write stemmers (suffix strippers) for Russian following Porter's model. SNOWBALL (for SNOBOL) is a formal language which has found it's main use in writing stemmers for different languages. Until now there are rule sets for Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Russian, Spanish and Swedish. Sometimes ago, somebody posted an French stemmer built from SNOWBALL. It seems straightforward to convert all these stemmers to Lucene and maybe include them in the package. The site for SNOWBALL is snowball.sf.net. The latest version of their compiler outputs Java code. I am attaching the Russian SNOWBALL file and its corresponding Java output. This is just the stemmer though and does not include the needed code for interfacing with Lucene. Best, Alex -----Original Message----- From: Philipp Chudinov [mailto:morpheus@basko.ru] Sent: Thursday, March 07, 2002 1:21 AM To: Lucene Users List Subject: Re: Support for russian morphology in Lucene its mei :) having no ideas about morphology and great wishes to use lucene in russian. nice to see you here. maybe we should try to do things together. ----- Original Message ----- From: "Vadim Solonovich" To: "Lucene Developers List" Cc: "Lucene Users List" Sent: Thursday, March 07, 2002 6:40 AM Subject: Support for russian morphology in Lucene > Hi All ! > > Is there anybody who have any ideas about implementing russian > morphology in Lucene ? > Please, let me know. > > Thanks in advance. > > Vadim Solonovich, > mailto:vsolon@park.ru > http://www.park.ru > http://garant.park.ru -- To unsubscribe, e-mail: For additional commands, e-mail: ------=_NextPart_000_0003_01C1C5B5.124F1690 Content-Type: application/octet-stream; name="russian.java" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="russian.java" // This file was generated automatically by the snowball to Java = converter=0A= =0A= package net.sf.snowball.ext;=0A= import net.sf.snowball.SnowballProgram;=0A= import net.sf.snowball.Among;=0A= =0A= /**=0A= * Generated class implementing code defined by a snowball script.=0A= */=0A= public class extends SnowballProgram {=0A= =0A= private Among a_0[] =3D {=0A= new Among ( "\u00D7\u00DB\u00C9", -1, 1, "", this),=0A= new Among ( "\u00C9\u00D7\u00DB\u00C9", 0, 2, "", this),=0A= new Among ( "\u00D9\u00D7\u00DB\u00C9", 0, 2, "", this),=0A= new Among ( "\u00D7", -1, 1, "", this),=0A= new Among ( "\u00C9\u00D7", 3, 2, "", this),=0A= new Among ( "\u00D9\u00D7", 3, 2, "", this),=0A= new Among ( "\u00D7\u00DB\u00C9\u00D3\u00D8", -1, 1, "", = this),=0A= new Among ( "\u00C9\u00D7\u00DB\u00C9\u00D3\u00D8", 6, 2, = "", this),=0A= new Among ( "\u00D9\u00D7\u00DB\u00C9\u00D3\u00D8", 6, 2, = "", this)=0A= };=0A= =0A= private Among a_1[] =3D {=0A= new Among ( "\u00C0\u00C0", -1, 1, "", this),=0A= new Among ( "\u00C5\u00C0", -1, 1, "", this),=0A= new Among ( "\u00CF\u00C0", -1, 1, "", this),=0A= new Among ( "\u00D5\u00C0", -1, 1, "", this),=0A= new Among ( "\u00C5\u00C5", -1, 1, "", this),=0A= new Among ( "\u00C9\u00C5", -1, 1, "", this),=0A= new Among ( "\u00CF\u00C5", -1, 1, "", this),=0A= new Among ( "\u00D9\u00C5", -1, 1, "", this),=0A= new Among ( "\u00C9\u00C8", -1, 1, "", this),=0A= new Among ( "\u00D9\u00C8", -1, 1, "", this),=0A= new Among ( "\u00C9\u00CD\u00C9", -1, 1, "", this),=0A= new Among ( "\u00D9\u00CD\u00C9", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CA", -1, 1, "", this),=0A= new Among ( "\u00C9\u00CA", -1, 1, "", this),=0A= new Among ( "\u00CF\u00CA", -1, 1, "", this),=0A= new Among ( "\u00D9\u00CA", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CD", -1, 1, "", this),=0A= new Among ( "\u00C9\u00CD", -1, 1, "", this),=0A= new Among ( "\u00CF\u00CD", -1, 1, "", this),=0A= new Among ( "\u00D9\u00CD", -1, 1, "", this),=0A= new Among ( "\u00C5\u00C7\u00CF", -1, 1, "", this),=0A= new Among ( "\u00CF\u00C7\u00CF", -1, 1, "", this),=0A= new Among ( "\u00C1\u00D1", -1, 1, "", this),=0A= new Among ( "\u00D1\u00D1", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CD\u00D5", -1, 1, "", this),=0A= new Among ( "\u00CF\u00CD\u00D5", -1, 1, "", this)=0A= };=0A= =0A= private Among a_2[] =3D {=0A= new Among ( "\u00C5\u00CD", -1, 1, "", this),=0A= new Among ( "\u00CE\u00CE", -1, 1, "", this),=0A= new Among ( "\u00D7\u00DB", -1, 1, "", this),=0A= new Among ( "\u00C9\u00D7\u00DB", 2, 2, "", this),=0A= new Among ( "\u00D9\u00D7\u00DB", 2, 2, "", this),=0A= new Among ( "\u00DD", -1, 1, "", this),=0A= new Among ( "\u00C0\u00DD", 5, 1, "", this),=0A= new Among ( "\u00D5\u00C0\u00DD", 6, 2, "", this)=0A= };=0A= =0A= private Among a_3[] =3D {=0A= new Among ( "\u00D3\u00D1", -1, 1, "", this),=0A= new Among ( "\u00D3\u00D8", -1, 1, "", this)=0A= };=0A= =0A= private Among a_4[] =3D {=0A= new Among ( "\u00C0", -1, 2, "", this),=0A= new Among ( "\u00D5\u00C0", 0, 2, "", this),=0A= new Among ( "\u00CC\u00C1", -1, 1, "", this),=0A= new Among ( "\u00C9\u00CC\u00C1", 2, 2, "", this),=0A= new Among ( "\u00D9\u00CC\u00C1", 2, 2, "", this),=0A= new Among ( "\u00CE\u00C1", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CE\u00C1", 5, 2, "", this),=0A= new Among ( "\u00C5\u00D4\u00C5", -1, 1, "", this),=0A= new Among ( "\u00C9\u00D4\u00C5", -1, 2, "", this),=0A= new Among ( "\u00CA\u00D4\u00C5", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CA\u00D4\u00C5", 9, 2, "", this),=0A= new Among ( "\u00D5\u00CA\u00D4\u00C5", 9, 2, "", this),=0A= new Among ( "\u00CC\u00C9", -1, 1, "", this),=0A= new Among ( "\u00C9\u00CC\u00C9", 12, 2, "", this),=0A= new Among ( "\u00D9\u00CC\u00C9", 12, 2, "", this),=0A= new Among ( "\u00CA", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CA", 15, 2, "", this),=0A= new Among ( "\u00D5\u00CA", 15, 2, "", this),=0A= new Among ( "\u00CC", -1, 1, "", this),=0A= new Among ( "\u00C9\u00CC", 18, 2, "", this),=0A= new Among ( "\u00D9\u00CC", 18, 2, "", this),=0A= new Among ( "\u00C5\u00CD", -1, 1, "", this),=0A= new Among ( "\u00C9\u00CD", -1, 2, "", this),=0A= new Among ( "\u00D9\u00CD", -1, 2, "", this),=0A= new Among ( "\u00CE", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CE", 24, 2, "", this),=0A= new Among ( "\u00CC\u00CF", -1, 1, "", this),=0A= new Among ( "\u00C9\u00CC\u00CF", 26, 2, "", this),=0A= new Among ( "\u00D9\u00CC\u00CF", 26, 2, "", this),=0A= new Among ( "\u00CE\u00CF", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CE\u00CF", 29, 2, "", this),=0A= new Among ( "\u00CE\u00CE\u00CF", 29, 1, "", this),=0A= new Among ( "\u00C0\u00D4", -1, 1, "", this),=0A= new Among ( "\u00D5\u00C0\u00D4", 32, 2, "", this),=0A= new Among ( "\u00C5\u00D4", -1, 1, "", this),=0A= new Among ( "\u00D5\u00C5\u00D4", 34, 2, "", this),=0A= new Among ( "\u00C9\u00D4", -1, 2, "", this),=0A= new Among ( "\u00D1\u00D4", -1, 2, "", this),=0A= new Among ( "\u00D9\u00D4", -1, 2, "", this),=0A= new Among ( "\u00D4\u00D8", -1, 1, "", this),=0A= new Among ( "\u00C9\u00D4\u00D8", 39, 2, "", this),=0A= new Among ( "\u00D9\u00D4\u00D8", 39, 2, "", this),=0A= new Among ( "\u00C5\u00DB\u00D8", -1, 1, "", this),=0A= new Among ( "\u00C9\u00DB\u00D8", -1, 2, "", this),=0A= new Among ( "\u00CE\u00D9", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CE\u00D9", 44, 2, "", this)=0A= };=0A= =0A= private Among a_5[] =3D {=0A= new Among ( "\u00C0", -1, 1, "", this),=0A= new Among ( "\u00C9\u00C0", 0, 1, "", this),=0A= new Among ( "\u00D8\u00C0", 0, 1, "", this),=0A= new Among ( "\u00C1", -1, 1, "", this),=0A= new Among ( "\u00C5", -1, 1, "", this),=0A= new Among ( "\u00C9\u00C5", 4, 1, "", this),=0A= new Among ( "\u00D8\u00C5", 4, 1, "", this),=0A= new Among ( "\u00C1\u00C8", -1, 1, "", this),=0A= new Among ( "\u00D1\u00C8", -1, 1, "", this),=0A= new Among ( "\u00C9\u00D1\u00C8", 8, 1, "", this),=0A= new Among ( "\u00C9", -1, 1, "", this),=0A= new Among ( "\u00C5\u00C9", 10, 1, "", this),=0A= new Among ( "\u00C9\u00C9", 10, 1, "", this),=0A= new Among ( "\u00C1\u00CD\u00C9", 10, 1, "", this),=0A= new Among ( "\u00D1\u00CD\u00C9", 10, 1, "", this),=0A= new Among ( "\u00C9\u00D1\u00CD\u00C9", 14, 1, "", this),=0A= new Among ( "\u00CA", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CA", 16, 1, "", this),=0A= new Among ( "\u00C9\u00C5\u00CA", 17, 1, "", this),=0A= new Among ( "\u00C9\u00CA", 16, 1, "", this),=0A= new Among ( "\u00CF\u00CA", 16, 1, "", this),=0A= new Among ( "\u00C1\u00CD", -1, 1, "", this),=0A= new Among ( "\u00C5\u00CD", -1, 1, "", this),=0A= new Among ( "\u00C9\u00C5\u00CD", 22, 1, "", this),=0A= new Among ( "\u00CF\u00CD", -1, 1, "", this),=0A= new Among ( "\u00D1\u00CD", -1, 1, "", this),=0A= new Among ( "\u00C9\u00D1\u00CD", 25, 1, "", this),=0A= new Among ( "\u00CF", -1, 1, "", this),=0A= new Among ( "\u00D1", -1, 1, "", this),=0A= new Among ( "\u00C9\u00D1", 28, 1, "", this),=0A= new Among ( "\u00D8\u00D1", 28, 1, "", this),=0A= new Among ( "\u00D5", -1, 1, "", this),=0A= new Among ( "\u00C5\u00D7", -1, 1, "", this),=0A= new Among ( "\u00CF\u00D7", -1, 1, "", this),=0A= new Among ( "\u00D8", -1, 1, "", this),=0A= new Among ( "\u00D9", -1, 1, "", this)=0A= };=0A= =0A= private Among a_6[] =3D {=0A= new Among ( "\u00CF\u00D3\u00D4", -1, 1, "", this),=0A= new Among ( "\u00CF\u00D3\u00D4\u00D8", -1, 1, "", this)=0A= };=0A= =0A= private Among a_7[] =3D {=0A= new Among ( "\u00C5\u00CA\u00DB\u00C5", -1, 1, "", this),=0A= new Among ( "\u00CE", -1, 2, "", this),=0A= new Among ( "\u00D8", -1, 3, "", this),=0A= new Among ( "\u00C5\u00CA\u00DB", -1, 1, "", this)=0A= };=0A= =0A= private static final char g_v[] =3D {35, 130, 34, 18 };=0A= =0A= private int I_p2;=0A= private int I_pV;=0A= =0A= private void copy_from( other) {=0A= I_p2 =3D other.I_p2;=0A= I_pV =3D other.I_pV;=0A= super.copy_from(other);=0A= }=0A= =0A= private boolean r_mark_regions() {=0A= int v_1;=0A= // (, line 57=0A= I_pV =3D limit;=0A= I_p2 =3D limit;=0A= // do, line 61=0A= v_1 =3D cursor;=0A= lab0: do {=0A= // (, line 61=0A= // gopast, line 62=0A= golab1: while(true)=0A= {=0A= lab2: do {=0A= if (!(in_grouping(g_v, 192, 220)))=0A= {=0A= break lab2;=0A= }=0A= break golab1;=0A= } while (false);=0A= if (cursor >=3D limit)=0A= {=0A= break lab0;=0A= }=0A= cursor++;=0A= }=0A= // setmark pV, line 62=0A= I_pV =3D cursor;=0A= // gopast, line 62=0A= golab3: while(true)=0A= {=0A= lab4: do {=0A= if (!(out_grouping(g_v, 192, 220)))=0A= {=0A= break lab4;=0A= }=0A= break golab3;=0A= } while (false);=0A= if (cursor >=3D limit)=0A= {=0A= break lab0;=0A= }=0A= cursor++;=0A= }=0A= // gopast, line 63=0A= golab5: while(true)=0A= {=0A= lab6: do {=0A= if (!(in_grouping(g_v, 192, 220)))=0A= {=0A= break lab6;=0A= }=0A= break golab5;=0A= } while (false);=0A= if (cursor >=3D limit)=0A= {=0A= break lab0;=0A= }=0A= cursor++;=0A= }=0A= // gopast, line 63=0A= golab7: while(true)=0A= {=0A= lab8: do {=0A= if (!(out_grouping(g_v, 192, 220)))=0A= {=0A= break lab8;=0A= }=0A= break golab7;=0A= } while (false);=0A= if (cursor >=3D limit)=0A= {=0A= break lab0;=0A= }=0A= cursor++;=0A= }=0A= // setmark p2, line 63=0A= I_p2 =3D cursor;=0A= } while (false);=0A= cursor =3D v_1;=0A= return true;=0A= }=0A= =0A= private boolean r_R2() {=0A= if (!(I_p2 <=3D cursor))=0A= {=0A= return false;=0A= }=0A= return true;=0A= }=0A= =0A= private boolean r_perfective_gerund() {=0A= int among_var;=0A= int v_1;=0A= // (, line 71=0A= // [, line 72=0A= ket =3D cursor;=0A= // substring, line 72=0A= among_var =3D find_among_b(a_0, 9);=0A= if (among_var =3D=3D 0)=0A= {=0A= return false;=0A= }=0A= // ], line 72=0A= bra =3D cursor;=0A= switch(among_var) {=0A= case 0:=0A= return false;=0A= case 1:=0A= // (, line 76=0A= // or, line 76=0A= lab0: do {=0A= v_1 =3D limit - cursor;=0A= lab1: do {=0A= // literal, line 76=0A= if (!(eq_s_b(1, "\u00C1")))=0A= {=0A= break lab1;=0A= }=0A= break lab0;=0A= } while (false);=0A= cursor =3D limit - v_1;=0A= // literal, line 76=0A= if (!(eq_s_b(1, "\u00D1")))=0A= {=0A= return false;=0A= }=0A= } while (false);=0A= // delete, line 76=0A= slice_del();=0A= break;=0A= case 2:=0A= // (, line 83=0A= // delete, line 83=0A= slice_del();=0A= break;=0A= }=0A= return true;=0A= }=0A= =0A= private boolean r_adjective() {=0A= int among_var;=0A= // (, line 87=0A= // [, line 88=0A= ket =3D cursor;=0A= // substring, line 88=0A= among_var =3D find_among_b(a_1, 26);=0A= if (among_var =3D=3D 0)=0A= {=0A= return false;=0A= }=0A= // ], line 88=0A= bra =3D cursor;=0A= switch(among_var) {=0A= case 0:=0A= return false;=0A= case 1:=0A= // (, line 97=0A= // delete, line 97=0A= slice_del();=0A= break;=0A= }=0A= return true;=0A= }=0A= =0A= private boolean r_adjectival() {=0A= int among_var;=0A= int v_1;=0A= int v_2;=0A= // (, line 101=0A= // call adjective, line 102=0A= if (!r_adjective())=0A= {=0A= return false;=0A= }=0A= // try, line 109=0A= v_1 =3D limit - cursor;=0A= lab0: do {=0A= // (, line 109=0A= // [, line 110=0A= ket =3D cursor;=0A= // substring, line 110=0A= among_var =3D find_among_b(a_2, 8);=0A= if (among_var =3D=3D 0)=0A= {=0A= cursor =3D limit - v_1;=0A= break lab0;=0A= }=0A= // ], line 110=0A= bra =3D cursor;=0A= switch(among_var) {=0A= case 0:=0A= cursor =3D limit - v_1;=0A= break lab0;=0A= case 1:=0A= // (, line 115=0A= // or, line 115=0A= lab1: do {=0A= v_2 =3D limit - cursor;=0A= lab2: do {=0A= // literal, line 115=0A= if (!(eq_s_b(1, "\u00C1")))=0A= {=0A= break lab2;=0A= }=0A= break lab1;=0A= } while (false);=0A= cursor =3D limit - v_2;=0A= // literal, line 115=0A= if (!(eq_s_b(1, "\u00D1")))=0A= {=0A= cursor =3D limit - v_1;=0A= break lab0;=0A= }=0A= } while (false);=0A= // delete, line 115=0A= slice_del();=0A= break;=0A= case 2:=0A= // (, line 122=0A= // delete, line 122=0A= slice_del();=0A= break;=0A= }=0A= } while (false);=0A= return true;=0A= }=0A= =0A= private boolean r_reflexive() {=0A= int among_var;=0A= // (, line 128=0A= // [, line 129=0A= ket =3D cursor;=0A= // substring, line 129=0A= among_var =3D find_among_b(a_3, 2);=0A= if (among_var =3D=3D 0)=0A= {=0A= return false;=0A= }=0A= // ], line 129=0A= bra =3D cursor;=0A= switch(among_var) {=0A= case 0:=0A= return false;=0A= case 1:=0A= // (, line 132=0A= // delete, line 132=0A= slice_del();=0A= break;=0A= }=0A= return true;=0A= }=0A= =0A= private boolean r_verb() {=0A= int among_var;=0A= int v_1;=0A= // (, line 136=0A= // [, line 137=0A= ket =3D cursor;=0A= // substring, line 137=0A= among_var =3D find_among_b(a_4, 46);=0A= if (among_var =3D=3D 0)=0A= {=0A= return false;=0A= }=0A= // ], line 137=0A= bra =3D cursor;=0A= switch(among_var) {=0A= case 0:=0A= return false;=0A= case 1:=0A= // (, line 143=0A= // or, line 143=0A= lab0: do {=0A= v_1 =3D limit - cursor;=0A= lab1: do {=0A= // literal, line 143=0A= if (!(eq_s_b(1, "\u00C1")))=0A= {=0A= break lab1;=0A= }=0A= break lab0;=0A= } while (false);=0A= cursor =3D limit - v_1;=0A= // literal, line 143=0A= if (!(eq_s_b(1, "\u00D1")))=0A= {=0A= return false;=0A= }=0A= } while (false);=0A= // delete, line 143=0A= slice_del();=0A= break;=0A= case 2:=0A= // (, line 151=0A= // delete, line 151=0A= slice_del();=0A= break;=0A= }=0A= return true;=0A= }=0A= =0A= private boolean r_noun() {=0A= int among_var;=0A= // (, line 159=0A= // [, line 160=0A= ket =3D cursor;=0A= // substring, line 160=0A= among_var =3D find_among_b(a_5, 36);=0A= if (among_var =3D=3D 0)=0A= {=0A= return false;=0A= }=0A= // ], line 160=0A= bra =3D cursor;=0A= switch(among_var) {=0A= case 0:=0A= return false;=0A= case 1:=0A= // (, line 167=0A= // delete, line 167=0A= slice_del();=0A= break;=0A= }=0A= return true;=0A= }=0A= =0A= private boolean r_derivational() {=0A= int among_var;=0A= // (, line 175=0A= // [, line 176=0A= ket =3D cursor;=0A= // substring, line 176=0A= among_var =3D find_among_b(a_6, 2);=0A= if (among_var =3D=3D 0)=0A= {=0A= return false;=0A= }=0A= // ], line 176=0A= bra =3D cursor;=0A= // call R2, line 176=0A= if (!r_R2())=0A= {=0A= return false;=0A= }=0A= switch(among_var) {=0A= case 0:=0A= return false;=0A= case 1:=0A= // (, line 179=0A= // delete, line 179=0A= slice_del();=0A= break;=0A= }=0A= return true;=0A= }=0A= =0A= private boolean r_tidy_up() {=0A= int among_var;=0A= // (, line 183=0A= // [, line 184=0A= ket =3D cursor;=0A= // substring, line 184=0A= among_var =3D find_among_b(a_7, 4);=0A= if (among_var =3D=3D 0)=0A= {=0A= return false;=0A= }=0A= // ], line 184=0A= bra =3D cursor;=0A= switch(among_var) {=0A= case 0:=0A= return false;=0A= case 1:=0A= // (, line 188=0A= // delete, line 188=0A= slice_del();=0A= // [, line 189=0A= ket =3D cursor;=0A= // literal, line 189=0A= if (!(eq_s_b(1, "\u00CE")))=0A= {=0A= return false;=0A= }=0A= // ], line 189=0A= bra =3D cursor;=0A= // literal, line 189=0A= if (!(eq_s_b(1, "\u00CE")))=0A= {=0A= return false;=0A= }=0A= // delete, line 189=0A= slice_del();=0A= break;=0A= case 2:=0A= // (, line 192=0A= // literal, line 192=0A= if (!(eq_s_b(1, "\u00CE")))=0A= {=0A= return false;=0A= }=0A= // delete, line 192=0A= slice_del();=0A= break;=0A= case 3:=0A= // (, line 194=0A= // delete, line 194=0A= slice_del();=0A= break;=0A= }=0A= return true;=0A= }=0A= =0A= public boolean stem() {=0A= int v_1;=0A= int v_2;=0A= int v_3;=0A= int v_4;=0A= int v_5;=0A= int v_6;=0A= int v_7;=0A= int v_8;=0A= int v_9;=0A= int v_10;=0A= // (, line 199=0A= // do, line 201=0A= v_1 =3D cursor;=0A= lab0: do {=0A= // call mark_regions, line 201=0A= if (!r_mark_regions())=0A= {=0A= break lab0;=0A= }=0A= } while (false);=0A= cursor =3D v_1;=0A= // backwards, line 202=0A= limit_backward =3D cursor; cursor =3D limit;=0A= // setlimit, line 202=0A= v_2 =3D limit - cursor;=0A= // tomark, line 202=0A= if (cursor < I_pV)=0A= {=0A= return false;=0A= }=0A= cursor =3D I_pV;=0A= v_3 =3D limit_backward;=0A= limit_backward =3D cursor;=0A= cursor =3D limit - v_2;=0A= // (, line 202=0A= // do, line 203=0A= v_4 =3D limit - cursor;=0A= lab1: do {=0A= // (, line 203=0A= // or, line 204=0A= lab2: do {=0A= v_5 =3D limit - cursor;=0A= lab3: do {=0A= // call perfective_gerund, line 204=0A= if (!r_perfective_gerund())=0A= {=0A= break lab3;=0A= }=0A= break lab2;=0A= } while (false);=0A= cursor =3D limit - v_5;=0A= // (, line 205=0A= // try, line 205=0A= v_6 =3D limit - cursor;=0A= lab4: do {=0A= // call reflexive, line 205=0A= if (!r_reflexive())=0A= {=0A= cursor =3D limit - v_6;=0A= break lab4;=0A= }=0A= } while (false);=0A= // or, line 206=0A= lab5: do {=0A= v_7 =3D limit - cursor;=0A= lab6: do {=0A= // call adjectival, line 206=0A= if (!r_adjectival())=0A= {=0A= break lab6;=0A= }=0A= break lab5;=0A= } while (false);=0A= cursor =3D limit - v_7;=0A= lab7: do {=0A= // call verb, line 206=0A= if (!r_verb())=0A= {=0A= break lab7;=0A= }=0A= break lab5;=0A= } while (false);=0A= cursor =3D limit - v_7;=0A= // call noun, line 206=0A= if (!r_noun())=0A= {=0A= break lab1;=0A= }=0A= } while (false);=0A= } while (false);=0A= } while (false);=0A= cursor =3D limit - v_4;=0A= // try, line 209=0A= v_8 =3D limit - cursor;=0A= lab8: do {=0A= // (, line 209=0A= // [, line 209=0A= ket =3D cursor;=0A= // literal, line 209=0A= if (!(eq_s_b(1, "\u00C9")))=0A= {=0A= cursor =3D limit - v_8;=0A= break lab8;=0A= }=0A= // ], line 209=0A= bra =3D cursor;=0A= // delete, line 209=0A= slice_del();=0A= } while (false);=0A= // do, line 212=0A= v_9 =3D limit - cursor;=0A= lab9: do {=0A= // call derivational, line 212=0A= if (!r_derivational())=0A= {=0A= break lab9;=0A= }=0A= } while (false);=0A= cursor =3D limit - v_9;=0A= // do, line 213=0A= v_10 =3D limit - cursor;=0A= lab10: do {=0A= // call tidy_up, line 213=0A= if (!r_tidy_up())=0A= {=0A= break lab10;=0A= }=0A= } while (false);=0A= cursor =3D limit - v_10;=0A= limit_backward =3D v_3;=0A= cursor =3D limit_backward; return true;=0A= }=0A= =0A= }=0A= =0A= ------=_NextPart_000_0003_01C1C5B5.124F1690 Content-Type: application/octet-stream; name="stem.sbl" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="stem.sbl" stringescapes {}=0A= =0A= // the 32 Cyrillic letters:=0A= =0A= stringdef a hex 'C1'=0A= stringdef b hex 'C2'=0A= stringdef v hex 'D7'=0A= stringdef g hex 'C7'=0A= stringdef d hex 'C4'=0A= stringdef e hex 'C5'=0A= stringdef zh hex 'D6'=0A= stringdef z hex 'DA'=0A= stringdef i hex 'C9'=0A= stringdef i` hex 'CA'=0A= stringdef k hex 'CB'=0A= stringdef l hex 'CC'=0A= stringdef m hex 'CD'=0A= stringdef n hex 'CE'=0A= stringdef o hex 'CF'=0A= stringdef p hex 'D0'=0A= stringdef r hex 'D2'=0A= stringdef s hex 'D3'=0A= stringdef t hex 'D4'=0A= stringdef u hex 'D5'=0A= stringdef f hex 'C6'=0A= stringdef kh hex 'C8'=0A= stringdef ts hex 'C3'=0A= stringdef ch hex 'DE'=0A= stringdef sh hex 'DB'=0A= stringdef shch hex 'DD'=0A= stringdef " hex 'DF'=0A= stringdef y hex 'D9'=0A= stringdef ' hex 'D8'=0A= stringdef e` hex 'DC'=0A= stringdef iu hex 'C0'=0A= stringdef ia hex 'D1'=0A= =0A= routines ( mark_regions R2=0A= perfective_gerund=0A= adjective=0A= adjectival=0A= reflexive=0A= verb=0A= noun=0A= derivational=0A= tidy_up=0A= )=0A= =0A= externals ( stem )=0A= =0A= integers ( pV p2 )=0A= =0A= groupings ( v )=0A= =0A= define v '{a}{e}{i}{o}{u}{y}{e`}{iu}{ia}'=0A= =0A= define mark_regions as (=0A= =0A= $pV =3D limit=0A= $p2 =3D limit=0A= do (=0A= gopast v setmark pV gopast non-v=0A= gopast v gopast non-v setmark p2=0A= )=0A= )=0A= =0A= backwardmode (=0A= =0A= define R2 as $p2 <=3D cursor=0A= =0A= define perfective_gerund as (=0A= [substring] among (=0A= '{v}'=0A= '{v}{sh}{i}'=0A= '{v}{sh}{i}{s}{'}'=0A= ('{a}' or '{ia}' delete)=0A= '{i}{v}'=0A= '{i}{v}{sh}{i}'=0A= '{i}{v}{sh}{i}{s}{'}'=0A= '{y}{v}'=0A= '{y}{v}{sh}{i}'=0A= '{y}{v}{sh}{i}{s}{'}'=0A= (delete)=0A= )=0A= )=0A= =0A= define adjective as (=0A= [substring] among (=0A= '{e}{e}' '{i}{e}' '{y}{e}' '{o}{e}' '{i}{m}{i}' '{y}{m}{i}'=0A= '{e}{i`}' '{i}{i`}' '{y}{i`}' '{o}{i`}' '{e}{m}' '{i}{m}'=0A= '{y}{m}' '{o}{m}' '{e}{g}{o}' '{o}{g}{o}' '{e}{m}{u}'=0A= '{o}{m}{u}' '{i}{kh}' '{y}{kh}' '{u}{iu}' '{iu}{iu}' = '{a}{ia}'=0A= '{ia}{ia}'=0A= // and -=0A= '{o}{iu}' // - which is somewhat archaic=0A= '{e}{iu}' // - soft form of {o}{iu}=0A= (delete)=0A= )=0A= )=0A= =0A= define adjectival as (=0A= adjective=0A= =0A= /* of the participle forms, em, vsh, ivsh, yvsh are readily = removable.=0A= nn, {iu}shch, shch, u{iu}shch can be removed, with a small = proportion of=0A= errors. Removing im, uem, enn creates too many errors.=0A= */=0A= =0A= try (=0A= [substring] among (=0A= '{e}{m}' // present passive participle=0A= '{n}{n}' // adjective from past passive = participle=0A= '{v}{sh}' // past active participle=0A= '{iu}{shch}' '{shch}' // present active participle=0A= ('{a}' or '{ia}' delete)=0A= =0A= //but not '{i}{m}' '{u}{e}{m}' // present passive participle=0A= //or '{e}{n}{n}' // adjective from past passive = participle=0A= =0A= '{i}{v}{sh}' '{y}{v}{sh}'// past active participle=0A= '{u}{iu}{shch}' // present active participle=0A= (delete)=0A= )=0A= )=0A= =0A= )=0A= =0A= define reflexive as (=0A= [substring] among (=0A= '{s}{ia}'=0A= '{s}{'}'=0A= (delete)=0A= )=0A= )=0A= =0A= define verb as (=0A= [substring] among (=0A= '{l}{a}' '{n}{a}' '{e}{t}{e}' '{i`}{t}{e}' '{l}{i}' '{i`}'=0A= '{l}' '{e}{m}' '{n}' '{l}{o}' '{n}{o}' '{e}{t}' '{iu}{t}'=0A= '{n}{y}' '{t}{'}' '{e}{sh}{'}'=0A= =0A= '{n}{n}{o}'=0A= ('{a}' or '{ia}' delete)=0A= =0A= '{i}{l}{a}' '{y}{l}{a}' '{e}{n}{a}' '{e}{i`}{t}{e}'=0A= '{u}{i`}{t}{e}' '{i}{t}{e}' '{i}{l}{i}' '{y}{l}{i}' '{e}{i`}'=0A= '{u}{i`}' '{i}{l}' '{y}{l}' '{i}{m}' '{y}{m}' '{e}{n}'=0A= '{i}{l}{o}' '{y}{l}{o}' '{e}{n}{o}' '{ia}{t}' '{u}{e}{t}'=0A= '{u}{iu}{t}' '{i}{t}' '{y}{t}' '{e}{n}{y}' '{i}{t}{'}'=0A= '{y}{t}{'}' '{i}{sh}{'}' '{u}{iu}' '{iu}'=0A= (delete)=0A= /* note the short passive participle tests:=0A= '{n}{a}' '{n}' '{n}{o}' '{n}{y}'=0A= '{e}{n}{a}' '{e}{n}' '{e}{n}{o}' '{e}{n}{y}'=0A= */=0A= )=0A= )=0A= =0A= define noun as (=0A= [substring] among (=0A= '{a}' '{e}{v}' '{o}{v}' '{i}{e}' '{'}{e}' '{e}'=0A= '{i}{ia}{m}{i}' '{ia}{m}{i}' '{a}{m}{i}' '{e}{i}' '{i}{i}'=0A= '{i}' '{i}{e}{i`}' '{e}{i`}' '{o}{i`}' '{i}{i`}' '{i`}'=0A= '{i}{ia}{m}' '{ia}{m}' '{i}{e}{m}' '{e}{m}' '{a}{m}' '{o}{m}'=0A= '{o}' '{u}' '{a}{kh}' '{i}{ia}{kh}' '{ia}{kh}' '{y}' '{'}'=0A= '{i}{iu}' '{'}{iu}' '{iu}' '{i}{ia}' '{'}{ia}' '{ia}'=0A= (delete)=0A= /* the small class of neuter forms '{e}{n}{i}' '{e}{n}{e}{m}'=0A= '{e}{n}{a}' '{e}{n}' '{e}{n}{a}{m}' '{e}{n}{a}{m}{i}' = '{e}{n}{a}{x}'=0A= omitted - they only occur on 12 words.=0A= */=0A= )=0A= )=0A= =0A= define derivational as (=0A= [substring] R2 among (=0A= '{o}{s}{t}'=0A= '{o}{s}{t}{'}'=0A= (delete)=0A= )=0A= )=0A= =0A= define tidy_up as (=0A= [substring] among (=0A= =0A= '{e}{i`}{sh}'=0A= '{e}{i`}{sh}{e}' // superlative forms=0A= (delete=0A= ['{n}'] '{n}' delete=0A= )=0A= '{n}'=0A= ('{n}' delete) // e.g. -nno endings=0A= '{'}'=0A= (delete) // with some slight false conflations=0A= )=0A= )=0A= )=0A= =0A= define stem as (=0A= =0A= do mark_regions=0A= backwards setlimit tomark pV for (=0A= do (=0A= perfective_gerund or=0A= ( try reflexive=0A= adjectival or verb or noun=0A= )=0A= )=0A= try([ '{i}' ] delete)=0A= // because noun ending -i{iu} is being treated as verb ending = -{iu}=0A= =0A= do derivational=0A= do tidy_up=0A= )=0A= )=0A= ------=_NextPart_000_0003_01C1C5B5.124F1690 Content-Type: text/plain; charset=us-ascii -- To unsubscribe, e-mail: For additional commands, e-mail: ------=_NextPart_000_0003_01C1C5B5.124F1690--