lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Regan (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3883) Analysis for Irish
Date Tue, 20 Mar 2012 17:31:38 GMT


Jim Regan commented on LUCENE-3883:

Great :)

Regarding the initial 'h', I asked Kevin Scannell (among other feathers in his cap, he created
the dictionary used in GaelSpell, and ran an Irish-language search engine), who said: 
"I looked carefully at how often initial h is a prefix vs not a while ago.  I can send you
those data - non-prefixes might be more common than you'd think in running text bc of proper
names, English mixed in, etc.  So upshot is it's a bad idea to strip all initial h's with
no hyphen following. 
  As far as h- (with hyphen) goes, it's non-standard but common enough that I'd leave it in
the stemmer.   Not like there would be false positives in that case if the hyphen is there.'
> Analysis for Irish
> ------------------
>                 Key: LUCENE-3883
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Jim Regan
>            Assignee: Robert Muir
>            Priority: Trivial
>              Labels: analysis, newbie
>         Attachments: LUCENE-3883.patch, LUCENE-3883.patch, irish.sbl
> Adds analysis for Irish.
> The stemmer is generated from a snowball stemmer. I've sent it to Martin Porter, who
says it will be added during the week.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message