lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Streitberger <>
Subject Re: Hibernate Search with Regex based on Table
Date Wed, 12 Sep 2012 13:22:19 GMT

thx for the hint. It seems to be an interesting solution.
Unfortunately I think it will come to problems with german names when 
umlauts (ö, ä) and the sharp s (ß) are mapped, because there are some 
requirements to map these chars to the usual german representation and 
consider this in search. let's say oe, ae, ss.


From:   "Jack Krupansky" <>
To:     <>
Date:   12.09.2012 15:02
Subject:        Re: Hibernate Search with Regex based on Table

It sounds as if MappingCharFilter would be sufficient. Unless there is 
additional requirement?

In Solr we have:
<fieldType name="text_char_norm" class="solr.TextField" 
positionIncrementGap="100" >
    <charFilter class="solr.MappingCharFilterFactory" 
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>

That mapping-ISOLatin1Accent.txt file maps or "folds" all the accented 
characters into the base ASCII letter.

-- Jack Krupansky

-----Original Message----- 
From: Robert Streitberger
Sent: Wednesday, September 12, 2012 8:45 AM
Subject: Hibernate Search with Regex based on Table


I am currently discussing the possibilities of introducing Hibernate
Search (Lucene) into an existing Java Web Project with existing Hibernate

Hibernate Queries are quite complex and mostly done with criteries.

For certain properties/columns we are looking for advanced search

Example: Assume we have a where clause with like search looking up for
names from different languages (we are on UTF-8 database) like let's say
Gomez -> which could also be written as Gómez or Gômez... what ever...

The idea for the search is to hava a table which provides all alternatives
for a certain letter... let's say o -> ô, ó, ò, ... and creating a regex
from this to find all possible combinations of Gomez no matter if we use
o, or variants of it from utf-8 character set. Problem is that regex can
be very large as there are alternatives for nearly any vocals and
consonants and regexp_like search of oracle database is quite restricted.

Thus idea would be to use some kind of index search with lucene.

In short: Would it be possible to introduce Hibernate Search in the
project? (There is at least hibernate 3.0 and Jdk 1.5 on tomcat 6 with
hbm.xml files available but not with annotations).
                Would it be possible to use indexed lucene search by
adding Restrictions to Hibernate Criterias?
                Would it be possible to also introduce the matching table
to create a complex regex?
               Or is there a restriction on the length of lucene regex
              Or is there maybe another way which is not using regex at
all if regex is not possible with this complexity?

Many thanks in advance!


To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message