lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilterâ„¢
Date Sat, 09 Jul 2011 14:09:17 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062383#comment-13062383
] 

Michael McCandless commented on LUCENE-3233:
--------------------------------------------

bq. But the lookup on the original is still faster, right?

That was before we optimized FST for this usage case.

Now, from the testing above, it looks like we are faster when syns actually match; if no syns
match the two are around the same speed.

Separately: shouldn't we not have any syns in the default text_en field type?  Like we can
have a synonyms.txt but comment out all the rules in there?

I don't think we should keep the old one around, ie, we should [eventually] replace it with
the new one.

> HuperDuperSynonymsFilterâ„¢
> -------------------------
>
>                 Key: LUCENE-3233
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3233
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch,
LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch,
LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch,
synonyms.zip
>
>
> The current synonymsfilter uses a lot of ram and cpu, especially at build time.
> I think yesterday I heard about "huge synonyms files" three times.
> So, I think we should use an FST-based structure, sharing the inputs and outputs.
> And we should be more efficient with the tokenStream api, e.g. using save/restoreState
instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message