lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <s...@shadstorhaug.com>
Subject RE: Parse query with special characters
Date Fri, 23 Jun 2017 18:02:50 GMT
Sergey,

Exactly what do you expect the string "!!22" to do? According to the documentation (https://lucene.apache.org/core/4_8_0/queryparser/index.html),
a single "!" is a logical NOT character, but a double "!!" is meaningless, so it throws an
exception. 

I tested and in Java you also get an exception in this case:

Cannot parse 'a AND !!b': Encountered " <NOT> "! "" at line 1, column 7.
Was expecting one of:
    <BAREOPER> ...
    "(" ...
    "*" ...
    <QUOTED> ...
    <TERM> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    <REGEXPTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    <TERM> ...
    "*" ...
    
So, this is part of the design, not a bug. Of course, if you change the string to "a AND \!!b",
it will work (and apply the NOT operator).

You basically have 3 options:

1. Catch the exception and use an alternate approach (possibly another query parser to give
it a second pass).
2. Clean the input by escaping unwanted special characters as per the documentation: https://lucene.apache.org/core/4_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html.

3. Use the SimpleQueryParser (https://lucene.apache.org/core/4_8_0/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.html)
that does not make the "!" into a special character, and is specifically designed for user-entered
input. Quote from documentation: "The main idea behind this parser is that a person should
be able to type whatever they want to represent a query, and this parser will do its best
to interpret what to search for no matter how poorly composed the request may be."


Thanks,
Shad Storhaug (NightOwl888)



-----Original Message-----
From: Sergey Zaharov [mailto:sergozaharov@gmail.com] 
Sent: Friday, June 23, 2017 7:08 PM
To: Prescott Nasser; user@lucenenet.apache.org
Subject: Re: Parse query with special characters

Hi all,

Is there any updates with that problem?

Best regards,
Sergey

2017-06-16 12:32 GMT+02:00 Sergey Zaharov <sergozaharov@gmail.com>:

> Hi,
>
> there is full code of console application
>
> using System;
> using System.Collections.Generic;
> using System.Linq;
> using Lucene.Net.Analysis.Core;
> using Lucene.Net.Documents;
> using Lucene.Net.Index;
> using Lucene.Net.QueryParsers.Classic; using 
> Lucene.Net.QueryParsers.Flexible.Standard;
> using Lucene.Net.Search;
> using Lucene.Net.Store;
> using Lucene.Net.Util;
>
> namespace Lucene4TestWSA
> {
>     class Program
>     {
>         private const string FIELD_BODY = "postBody";
>         private const string FIELD_SECURITY = "Security";
>
>         private static IndexWriter _writer;
>         private static Directory _directory;
>         private static WhitespaceAnalyzer _analyzer;
>         private static IndexReader _indexReader;
>         private static IndexSearcher _searcher;
>         private static IndexWriterConfig _cfg;
>
>         private static void AddNewItem(FullTextIndexItem item)
>         {
>             if (_writer == null) return;
>             var doc = new Document();
>
>            var objectText = (item.ObjectText ?? "");
>             doc.Add(new TextField(FIELD_BODY, objectText, 
> Field.Store.NO ));
>
>             var securCodes = (item.Access == null || item.Access.All(x 
> => x.Key == 0))
>                 ? "?"
>                 : string.Join(" ", item.Access.Where(x => x.Key != 
> 0).Select(x => x.Key.ToString() + x.Info).ToList());
>             doc.Add(new TextField(FIELD_SECURITY, 
> securCodes.ToLower(), Field.Store.YES));
>
>             _writer.AddDocument(doc);
>         }
>
>         static void Main(string[] args)
>         {
>
>             var dir = @"c:\TestLuceneDir";
>             if (System.IO.Directory.Exists(dir))
>             {
>                 System.IO.Directory.Delete(dir, true);
>             }
>
>             var di = System.IO.Directory.CreateDirectory(dir);
>             var _directory = FSDirectory.Open(di);
>             _analyzer = new WhitespaceAnalyzer(LuceneVersion.LUCENE_48);
>             _cfg = new IndexWriterConfig(LuceneVersion.LUCENE_48,
> _analyzer);
>             var writer = new IndexWriter(_directory, _cfg);
>             writer.Commit();
>             writer.Dispose();
>             _cfg = null;
>
>             _indexReader = DirectoryReader.Open(_directory);
>             _searcher = new IndexSearcher(_indexReader);
>
>             var analyzer = new WhitespaceAnalyzer( 
> LuceneVersion.LUCENE_48);
>             _cfg = new IndexWriterConfig(LuceneVersion.LUCENE_48,
> analyzer);
>             _writer = new IndexWriter(_directory, _cfg);
>
>             AddNewItem(new FullTextIndexItem
>             {
>                 ObjectText = "111 !!222 333 qqq",
>
>                 Access = new List<FullTextIndexItemAccessInfo>()
>                 {
>                     new FullTextIndexItemAccessInfo() { Key = 1037, 
> Info = "PW???"},
>                     new FullTextIndexItemAccessInfo() { Key = 1041, 
> Info = "P????"}
>                 }
>             });
>
>             AddNewItem(new FullTextIndexItem
>             {
>                 ObjectText = "aaa bbb ccc qqq",
>                 Access = new List<FullTextIndexItemAccessInfo>()
>                 {
>                     new FullTextIndexItemAccessInfo() { Key = 1037, 
> Info = "PW???"},
>                     new FullTextIndexItemAccessInfo() { Key = 1042, 
> Info = "PW??C"}
>                 }
>             });
>
>             _writer.Commit();
>             _writer.Dispose();
>             _writer = null;
>             _cfg = null;
>             _indexReader = DirectoryReader.Open(_directory);
>             _searcher = new IndexSearcher(_indexReader);
>
>             _analyzer = new WhitespaceAnalyzer(LuceneVersion.LUCENE_48);
>             var boolQry = new BooleanQuery();
>
>             var parser = new QueryParser(LuceneVersion.LUCENE_48,
> FIELD_BODY, _analyzer) { AllowLeadingWildcard = true };
>             var textQry = parser.Parse("*22/*");
>             boolQry.Add(textQry, Occur.MUST);
>             var an = new WhitespaceAnalyzer(LuceneVersion.LUCENE_48);
>             var localParser = new QueryParser(LuceneVersion.LUCENE_48,
> FIELD_SECURITY, an);
>
>             var localQry = localParser.Parse("1037p????");
>
>             boolQry.Add(localQry, Occur.MUST);
>
>             var qryRes = _searcher.Search(boolQry, 1000);
>
>             Console.WriteLine($"Result found {qryRes.TotalHits}");
>             Console.ReadLine();
>         }
>     }
>
>     public class FullTextIndexItemAccessInfo
>     {
>         public int Key { get; set; }
>         public string Info { get; set; }
>     }
>
>     public class FullTextIndexItem
>     {
>         public string ObjectText { get; set; }
>         public List<FullTextIndexItemAccessInfo> Access { get; set; }
>     }
> }
>
> Hope that would help.
>
> Manz thanks,
> Sergey
>
> 2017-06-16 9:39 GMT+02:00 Prescott Nasser <geobmx540@hotmail.com>:
>
>> Adding Sergey who isn't subscribed to the mailing list..
>>
>> ------
>>
>> Sergey,
>>
>> Please provide the actual code, not a screenshot. Apparently, the 
>> mailing list server strips out images from the email, so it is 
>> impossible to help you without knowing what you are doing.
>>
>> Thanks,
>> Shad Storhaug (NightOwl888)
>>
>> -----Original Message-----
>> From: Sergey Zaharov [mailto:sergozaharov@gmail.com]
>> Sent: Friday, June 16, 2017 2:16 PM
>> To: user@lucenenet.apache.org
>> Subject: Parse query with special characters
>>
>> Hi all,
>>
>> could you please help me with next issue: i have a problem with 
>> parsing strings with some special characters. Example on the 
>> screenshot, Also error comes when i parse string like "!!22", but in the same time
string "!22"
>> parsed normally.
>> [image: Встроенное изображение 1]
>>
>> So question is how can i parse ALL possible strings that comes from user?
>> may be i need another parser? Otherwise should i clear query string 
>> from some characters/combinations and if yes, then where can i take that list?
>> Probably, it exists some utils that could help normalize query string.
>>
>> Thanks you in advance.
>>
>> --
>> Best regards, Sergey.
>>
>> --
>> Best regards, Sergey.
>>
>
>
>
> --
> Best regards, Sergey.
>



--
Best regards, Sergey.
Mime
View raw message