lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Research problems on numeric values into text (with. or,)
Date Wed, 28 Sep 2016 10:06:19 GMT
Thank you for bringing closure.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Sep 28, 2016 at 11:56 AM, Jérémy GUYENOT <JGUYENOT@efalia.com>
wrote:

> Hi Michael,
>
>
>
> I just find my problem. Du to a Lucene problem that it index “abcd.” Like
> a word we added into our code a regex to add space between “abcd” and “.”
> (or punctuation caracters).
>
>
>
> So I update this regex and it wxorks fine.
>
>
>
> The code before:
>
> // Add space between word and punctuation caracters
>
> String pattern = "(\\w)([\\.,;\\?!:])";
>
> contents = contents.replaceAll(pattern, "$1 $2");
>
>
>
> The code after:
>
> // Not taking into account the figures if the amounts will be cut
>
>             // REGEX: all words ([a-zA-Z0-9]) followed by,;.? but not
> immediately followed by punctuation
>
>             String pattern = "(\\w)([\\.,;\\?!:])(?!(\\s*[0-9]))";
>
>             contents = contents.replaceAll(pattern, "$1 $2");
>
>
>
> Thanks a lot for your time.
>
>
>
> Good bye
>
>
>
> *Jérémy GUYENOT | *Responsable service R&D
> *jguyenot@efalia.com <jguyenot@efalia.com>*
>
> ———————————
>
> 49, av. de la République 69200 Vénissieux | Tél : 04 72 51 77 55 | Fax :
> 04 72 50 43 13
> *WWW.EFALIA.COM* <http://www.efalia.com/>
> [image: cid:image003.jpg@01D0B342.03D15BD0] <http://www.efalia.com/>
>
> *Pour assurer un suivi technique de vos demandes veuillez passer par **Mantis
> <http://feqa.communauteged-multigest.fr/>** notre outil en ligne.*
>
>
>
> P *Eco-responsabilité, n'imprimez ce mail que si nécessaire*
>
>
>
> *De :* Michael McCandless [mailto:lucene@mikemccandless.com]
> *Envoyé :* mardi 27 septembre 2016 16:19
> *À :* Lucene Users <java-user@lucene.apache.org>; Jérémy GUYENOT <
> JGUYENOT@efalia.com>
> *Cc :* Jan Høydahl <jan.asf@cominvent.com>
>
> *Objet :* Re: Research problems on numeric values into text (with. or,)
>
>
>
> Possibly you are using an analyzer that does not preserve decimal numbers
> as a single token?  Or, you are using a different analyzer at indexing time
> vs search time?
>
>
>
> Can you make a small test case showing the issue?
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
>
> On Tue, Sep 27, 2016 at 3:06 PM, Jérémy GUYENOT <JGUYENOT@efalia.com>
> wrote:
>
> Hello,
>
>
>
> Sorry for this multi post but my first post was without answers so I try
> another way.
>
>
>
> *What are you indexing?*
>
> I wish to index files such as that present in the "ZIP \ file" folder,
> which contains decimal data (with. Or, as decimal separator).
>
>
>
> *How are you searching, and what did you expect to find?*
>
> I want to be able to search decimals because our tools stock large
> quantities of such documents (eg invoices, quotes, orders).
>
>
>
> *What do you actually see and why is that a problem?*
>
> The search for the number 404 returns files containing 404.
> The search for the number 50 returns files containing 50.
> The search for the number 404.50 returns no results.
>
> The text content was store in a TextField with Field.Store.NO.
>
> I try some of Analysers but the result is the same. I also try with 4.3.1
> and 6.2.0 of lucene but the same.
>
>
>
> I wish you can give me some details to search decimals values into text
> files.
>
>
>
> In the zip you can find:
>
> -       File
>
> o   The file example containing decimals values
>
> -       Index
>
> o   The files of Lucene indexation
>
> -       Indexationlucene
>
> o   The code that we have to index file from our app
>
> -       RechercheLucene
>
> o   The code that we have to search into our app
>
>
>
> Cordially
>
>
>
> *Jérémy GUYENOT | *Responsable service R&D
> *jguyenot@efalia.com <jguyenot@efalia.com>*
>
> ———————————
>
> 49, av. de la République 69200 Vénissieux | Tél : 04 72 51 77 55 | Fax :
> 04 72 50 43 13
> *WWW.EFALIA.COM* <http://www.efalia.com/>
> [image: cid:image003.jpg@01D0B342.03D15BD0] <http://www.efalia.com/>
>
> *Pour assurer un suivi technique de vos demandes veuillez passer par **Mantis
> <http://feqa.communauteged-multigest.fr/>** notre outil en ligne.*
>
>
>
> P *Eco-responsabilité, n'imprimez ce mail que si nécessaire*
>
>
>
> *De :* Jan Høydahl [mailto:jan.asf@cominvent.com]
> *Envoyé :* mardi 27 septembre 2016 10:20
> *À :* java-user@lucene.apache.org
> *Cc :* Jérémy GUYENOT <JGUYENOT@efalia.com>
> *Objet :* Re: Research problems on numeric values into text (with. or,)
>
>
>
> Please do not cross-post to multiple mailing lists.
>
> This belongs to java-user only.
>
> It is also generally better to describe the problem in more detail in the
> mail, than attaching a zip.
>
> - What are you indexing
>
> - How are you searching, and what did you expect to find
>
> - What do you actually see and why is that a problem?
>
>
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>
>
> 27. sep. 2016 kl. 10.15 skrev Jérémy GUYENOT <JGUYENOT@efalia.com>:
>
>
>
> Hello,
>
> we find research problems on numeric values into text (with. or,). Unable
> to search 315.86 or 315.86.
>
> We try custom Analysers without success either.
>
> I enclose the code used to index and one to do the research.
>
> I do not know if this is a bug on your side or problem Analyze of ours.
>
> The problem is the same between version 4.3.1 and 6.2.0.
>
> Thank you in advance for your quick return.
>
> cordially
>
> <LUCENE.zip>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message