lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3359) Option for no Front Encoding of term compression
Date Thu, 04 Aug 2011 01:17:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079162#comment-13079162
] 

Robert Muir commented on LUCENE-3359:
-------------------------------------

No? Many european languages have the same suffix too.

But the term dictionary needs to be in sorted order for many reasons.

Things like this are better discussed on the mailing list.

> Option for no Front Encoding of term compression
> ------------------------------------------------
>
>                 Key: LUCENE-3359
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3359
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 3.3
>            Reporter: Gang Luo
>            Priority: Minor
>              Labels: Encoding, Front, compression
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Average length of a word in the English language is 5.1 , so Front Encoding of term compression
in index is meaningful. But average length of a word in the Chinese language is 2.3. No need
Front Encoding for chinese document index?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message