lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3415) Snowball filter to include original word too
Date Tue, 06 Sep 2011 10:58:09 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097890#comment-13097890
] 

Manish commented on LUCENE-3415:
--------------------------------

The index size becomes huge (infact double). 
We have 2 fields both indexed and stored, one with stemming and one without stemming. We thought
of removing the stored=true from one of the fields, but highlighting becomes the problem(the
field 1 wont have original words and hence term vectors wont highlight it properly)

I have an idea bases on Simon's comments, dont know if it going to work or not. 

1. Create new Filter Factory which will put both the stemmed word and original word. 
2. Field 1-> indexed=true, stored=true, use the above filter
3. Field 2-> indexed=true, stored=false, dont use the above filter. 

I can make searches against the corresponding fields. for highlighting, i can always use Field
1 and since term vectors, offsets and positions are present for original words too, it will
highlight properly. 

Do let me know your thoughts on this. 

> Snowball filter to include original word too
> --------------------------------------------
>
>                 Key: LUCENE-3415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3415
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.3
>         Environment: All
>            Reporter: Manish
>              Labels: features
>             Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed word to
the index. So, if i want to do search with / without stemming, i have to keep 2 fields, one
with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, it would
solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by changing
the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message