lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaime <>
Subject Re: Preprocess input text before tokenizing
Date Fri, 24 Jun 2016 07:54:04 GMT
Thank you very much, that seems to solve my issue.

However, I find this a little cumbersome. I need to filter the text 
before any tokenizing takes place, so I have to implement a filtered 
version of every analyzer I'm using (StandardAnalyzer and 
SpanishAnalyzer and a custom analyzer right now).

If I need to support another analyzer in the future (a very plausible 
possibility) I will need to create another version of that analyzer. 
Whenever any of those analyzer is changed, I will need to manually apply 
the changes.

Isn't there a better way to do this?

El 23/06/2016 a las 20:28, Ahmet Arslan escribió:
> Hi,
> Zero or more CharFilter(s) is the way to manipulate text before the tokenizer.
> I think init reader is the method you want to plug char filters.
> Ahmet
> On Thursday, June 23, 2016 6:47 PM, Jaime <> wrote:
> Hello,
> I want to change the input text before tokenizing. I think I just need
> to use some characters as word separators, and maybe remove some others
> completely.
> I was planning to use MappingCharFilterFactory to replace some chars
> with " " and others with "", but I feel like I'm not in the right track.
> First, I've implemented a custom analyzer to use my custom tokenizer. My
> idea was to inherit from StandardTokenizer and, in setReader, calling
> MappingCharFilterFactory.create(reader) from within.
> However, setReader is final, so I can't override it.
> Is there a better way to do this?
> In any case, how should I use MappingCharFilter in case I really needed it?
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Jaime Pardos
Avda. de Madrid nº 120 nave 10, 28500, Arganda del Rey, MADRID,
AVISO LEGAL: Este mensaje y sus archivos adjuntos van dirigidos exclusivamente a su destinatario,
pudiendo contener información confidencial sometida a secreto confidencial. No está permitida
su reproducción o distribución sin la autorización expresa de ESTRUCTURE MEDIA SYSTEMS,
S.L.. Si usted no es el destinatario final por favor elimínelo e infórmenos por esta vía.
De acuerdo con lo establecido en la Ley Orgánica 15/1999, de 13 de diciembre, de Protección
de Datos de Carácter Personal (LOPD), le informamos que sus datos están incorporados en
un fichero del que es titular ESTRUCTURE MEDIA SYSTEMS, S.L. con la finalidad de realizar
la gestión administrativa, contable, y fiscal, así como enviarle comunicaciones comerciales
sobre nuestros productos y/o servicios. Asimismo, le informamos de la posibilidad de ejercer
los derechos de acceso, rectificación, cancelación y oposición de sus datos en el domicilio
de ESTRUCTURE MEDIA SYSTEMS, S.L., sito en Avda. de Madrid nº 120 nave 10, 28500, Arganda
del Rey, MADRID, o a la dirección de correo electrónico
This message and its attachments are intended solely for the addressee and may contain confidential
information submitted to confidential secret. It is not allowed its reproduction or distribution
without the express permission of ESTRUCTURE MEDIA SYSTEMS, S.L. .. If you are not the intended
recipient please delete it and inform us in this way. According to the provisions of Law 15/1999,
of December 13, Protection of Personal Data (LOPD), we inform you that your data is incorporated
into a file which is owned by ESTRUCTURE MEDIA SYSTEMS, S.L. in order to perform administrative,
accounting and fiscal management, as well as send you communications about our products and
/ or services. Also we advised of the possibility of exercising rights of access, rectification,
cancellation and opposition of their data at the home of ESTRUCTURE MEDIA SYSTEMS, SL, located
in Avda. De Madrid # 120 ship 10 28500, Arganda del Rey, Madrid , or email address

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message