lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anders Lybecker <and...@lybecker.com>
Subject Re: Problems with Wildcard searches.
Date Thu, 21 Dec 2017 11:39:37 GMT
Hi Jens,

You are right. Something is wrong here.

Can you share some code, as this seems odd.

Regards,
Anders Lybecker (a fellow dane :-))

On Thu, Dec 21, 2017 at 11:16 AM, Jens Melgaard <
Jens.Melgaard@systematic.com> wrote:

> Hello
>
> This is a bit of a shoot in blind, but while I try to see how I can
> investigate further, I thought that I would try to see if we could be lucky
> to hit someone who had experienced a similar issue as we are facing right
> now.
>
>
>
> First a little bit of back ground.
> We use Lucene.Net 3.0.3 to index json documents, each json field gets
> translated into a fieldname as you would access that field on the document,
> so { obj: { fieldName: “42kittens” } } would be translated into
> “obj.fieldName” = “42kittens” etc. Depending on the datatype from json,
> each field is indexed differently but right now we can focus on “text
> fields” as that is where our issue is atm.
>
>
>
> We use a StandardAnalyzer with an empty stopset and the query parser is a
> slightly modified version of the MultiFieldQueryParser allowing for using
> “*” in range queries as well as having a dynamic fields set depending on
> what has been indexed. (We keep automatically track of all possible fields
> in the system)
>
>
>
> We currently have about ~500.000 documents in our index, each document
> ranges from ~10 fields to thousands of fields (each field may be
> represented multiple times because of arrays), this results in about a 4GB
> index.
>
>
>
> All in all everything seemed to work just fine, however yesterday we
> discovered that we had some issues using wildcards.
>
>
>
> We have some documents which represents ports all over the world, these
> have what is called a locode, a locode is always 5 characters, e.g. DKAAR,
> VIFRD, ITPVT etc… The first 2 letters represent the country, so DKAAR is in
> Denmark, VI is U.S. Virgin Island, IT is Itally. You can get more here:
> http://locode.info (It might not be an exhausted list)…
>
> Now if I search for “locode: MA*” I get:
>
>
>
> -      MA888
>
> -      MA6KN
>
>
>
> However if I search for “locode: MAAGA” I get:
>
>
>
> -      MAAGA
>
>
>
> But that should have been included in the search above it as MA* clearly
> should match MAAGA.
>
>
>
> If I search for “locode: (MA* OR MAAGA)” I get:
>
>
>
> -      MA888
>
> -      MA6KN
>
> -      MAAGA
>
>
> Now if I search for “locode: MAA*” I now get:
>
> -      MAAHU
>
> -      MAAZE
>
> -      MAANZ
>
> -      MAASI
>
> -      MAAGA
>
>
>
> Which all should be part of the first result right?...
>
>
>
> So I am thinking that there is something I am missing here…
>
> Med venlig hilsen / Kind regards
>
> [image: Systematic Logo] <http://www.systematic.com/>
> *Jens Melgaard*
> System Architect
>
> Søren Frichs Vej 39
> <https://maps.google.com/?q=S%C3%B8ren+Frichs+Vej+39,%0D+8000%0D+Aarhus+C+%0D+Denmark&entry=gmail&source=g>,
> 8000
> <https://maps.google.com/?q=S%C3%B8ren+Frichs+Vej+39,%0D+8000%0D+Aarhus+C+%0D+Denmark&entry=gmail&source=g>
Aarhus
> C
> <https://maps.google.com/?q=S%C3%B8ren+Frichs+Vej+39,%0D+8000%0D+Aarhus+C+%0D+Denmark&entry=gmail&source=g>
> Denmark
> <https://maps.google.com/?q=S%C3%B8ren+Frichs+Vej+39,%0D+8000%0D+Aarhus+C+%0D+Denmark&entry=gmail&source=g>
>
> Mobile: +45 4196 5119 <41%2096%2051%2019>
> Jens.Melgaard@systematic.com
> www.systematic.com
>
> [image: Seasons greetings from systematic] <http://systematic.com/>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message