lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <>
Subject [jira] Commented: (SOLR-293) Add "minPartLength" to WordDelimiterFilter
Date Mon, 09 Jul 2007 20:43:05 GMT


Yonik Seeley commented on SOLR-293:

Would it be useful to be able to configure this separately for words and numbers?

On the indexing side, it makes sense to index "A9" and not "A" or "9"

> It is recommended to use it with catenateAll

Is there anything that can be done along the same lines, when not catenating for the query
analyzer, so "foo-bar" will still become "foo bar", but "A9" would stay as "A9"?

> Add "minPartLength" to WordDelimiterFilter
> ------------------------------------------
>                 Key: SOLR-293
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Mike Klaas
>            Assignee: Mike Klaas
>            Priority: Minor
>             Fix For: 1.3
> WDF is handy but over-tokenizes when faced with short word parts:
> A9
> R2D2
> mp3
> This creates one- or two- character tokens which are extremely slow to query as the doc
freq is so high (this is contributing to a significant portion of our slowest queries).
> This patch adds a "minPartLength" option that disables generation of parts below a certain
length.  It is recommended to use it with catenateAll, so as to not lose tokens.
> I'll add factory options and tests if we decide to include this (and are happy with the
parameter name).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message