lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Hibernate Search with Regex based on Table
Date Wed, 12 Sep 2012 13:52:41 GMT
MappingCharFilter  can do all of that. The file I referenced already has ae, 
oe, and ss. That default file handles your umlauts differently, but you can 
change the rules to suit your exact needs.

-- Jack Krupansky

-----Original Message----- 
From: Robert Streitberger
Sent: Wednesday, September 12, 2012 9:22 AM
To: java-user@lucene.apache.org
Subject: Re: Hibernate Search with Regex based on Table

Hi,

thx for the hint. It seems to be an interesting solution.
Unfortunately I think it will come to problems with german names when
umlauts (ö, ä) and the sharp s (ß) are mapped, because there are some
requirements to map these chars to the usual german representation and
consider this in search. let's say oe, ae, ss.

kr
Rob



From:   "Jack Krupansky" <jack@basetechnology.com>
To:     <java-user@lucene.apache.org>
Date:   12.09.2012 15:02
Subject:        Re: Hibernate Search with Regex based on Table



It sounds as if MappingCharFilter would be sufficient. Unless there is
some
additional requirement?

In Solr we have:
<fieldType name="text_char_norm" class="solr.TextField"
positionIncrementGap="100" >
  <analyzer>
    <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  </analyzer>
</fieldType>

That mapping-ISOLatin1Accent.txt file maps or "folds" all the accented
characters into the base ASCII letter.

-- Jack Krupansky

-----Original Message----- 
From: Robert Streitberger
Sent: Wednesday, September 12, 2012 8:45 AM
To: java-user@lucene.apache.org
Subject: Hibernate Search with Regex based on Table

Hello,

I am currently discussing the possibilities of introducing Hibernate
Search (Lucene) into an existing Java Web Project with existing Hibernate
Layer.

Hibernate Queries are quite complex and mostly done with criteries.

For certain properties/columns we are looking for advanced search
possibilities.

Example: Assume we have a where clause with like search looking up for
names from different languages (we are on UTF-8 database) like let's say
Gomez -> which could also be written as Gómez or Gômez... what ever...

The idea for the search is to hava a table which provides all alternatives
for a certain letter... let's say o -> ô, ó, ò, ... and creating a regex
from this to find all possible combinations of Gomez no matter if we use
o, or variants of it from utf-8 character set. Problem is that regex can
be very large as there are alternatives for nearly any vocals and
consonants and regexp_like search of oracle database is quite restricted.

Thus idea would be to use some kind of index search with lucene.

In short: Would it be possible to introduce Hibernate Search in the
project? (There is at least hibernate 3.0 and Jdk 1.5 on tomcat 6 with
hbm.xml files available but not with annotations).
                Would it be possible to use indexed lucene search by
adding Restrictions to Hibernate Criterias?
                Would it be possible to also introduce the matching table
to create a complex regex?
               Or is there a restriction on the length of lucene regex
expressions?
              Or is there maybe another way which is not using regex at
all if regex is not possible with this complexity?


Many thanks in advance!
kr


Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message