lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Peuss (JIRA)" <>
Subject [jira] Created: (LUCENE-1166) A tokenfilter to decompose compound words
Date Wed, 06 Feb 2008 11:09:08 GMT
A tokenfilter to decompose compound words

                 Key: LUCENE-1166
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Analysis
            Reporter: Thomas Peuss
         Attachments: CompoundTokenFilter.patch

A tokenfilter to decompose compound words you find in many germanic languages (like German,
Swedish, ...) into single tokens.

An example: Donaudampfschiff would be decomposed to Donau, dampf, schiff so that you can find
the word even when you only enter "Schiff".

I use the hyphenation code from the Apache XML project FOP (
to do the first step of decomposition. Currently I use the FOP jars directly. I only use a
handful of classes from the FOP project.

My question now:
Would it be OK to copy this classes over to the Lucene project (renaming the packages of course)
or should I stick with the dependency to the FOP jars? The FOP code uses the ASF V2 license
as well.

What do you think?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message