lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2745) ArabicAnalyzer - the ability to recognise email addresses host names and so on
Date Sun, 07 Nov 2010 20:16:06 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929392#action_12929392
] 

Steven Rowe commented on LUCENE-2745:
-------------------------------------

bq. steven, check out the link at the bottom of that article.

Yup, did that.

bq. especially the top... it explains the use in the language, particularly to block cursive
joining for prefixes, suffixes, compounds. we split on this and the affixes are in the stoplist


Um, like I said, Persian uses ZWNJs as display hints, not as word separators.

According to the [ICU web demo|http://demo.icu-project.org/icu-bin/ubrowse?go=200C], ZWNJs
have the \p{Word_Break:Extend} property, so the Lucene UAX#29-based tokenizers will *not*
split on this char.

What am I not getting?

> ArabicAnalyzer - the ability to recognise email addresses host names and so on
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2745
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2745
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2
>         Environment: All
>            Reporter: M Alexander
>
> The ArabicAnalyzer does not recognise email addresses, hostnames and so on. For example,
> adam@hotmail.com
> will be tokenised to [adam] [hotmail] [com]
> It would be great if the ArabicAnalyzer can tokenises this to [adam@hotmail.com]. The
same applies to hostnames and so on.
> Can this be resolved? I hope so
> Thanks
> MAA

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message