lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Chauhan <abhishek.chauhan...@gmail.com>
Subject AlphaNumeric analyzer/tokenizer
Date Fri, 16 Aug 2019 09:22:57 GMT
Hi,

We have been using SimpleAnalyzer which keeps only letters in its tokens.
This limits us to search in strings that contains both letters and numbers.
For e.g. "axt1234". SimpleAnalyzer would only enable us to search for "axt"
successfully, but search strings like "axt1", "axt123" etc would give no
results because while indexing it ignored the numbers.

I can use StandardAnalyzer or WhitespaceAnalyzer but I want to tokenize on
underscores also
which these analyzers don't do. I have also looked at WordDelimiterFilter
which will split "axt1234" into "axt" and "1234". However, using this also,
I cannot search for "axt12" etc.

Is there something like an Alphanumeric analyzer which would be very
similar to SimpleAnalzyer but in addition to letters it would also keep
digits in its tokens? I am willing contribute such an analyzer if one is
not available.

Thanks and Regards,
Abhishek

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message