lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautovic <emir.arnauto...@sematext.com>
Subject Re: Search substring in field
Date Wed, 10 May 2017 14:58:55 GMT
Hi,

Solr works on top of data structure called inverted index 
<https://en.wikipedia.org/wiki/Inverted_index>. You can misuse it and do 
not invert your documents and use regex or wildcards to find matches, 
but that is not the way to use it - it'll be significantly slower.

Solr does support subset of regex and syntax for that is field:/regex/

Solr also supports wildcards: * and ?

In any case you have to be aware that it matches tokens and you have to 
setup your analysis properly to make it work (at least need to lowercase 
if want to make it case insensitive).


On 09.05.2017 19:15, jnobre wrote:
> Hello,
>
> Thanks for your response.
>
> I realize the concept, but I do not know which one to use in my case. Not
> exactly the difference between the analyzes.
>
> 1- At this moment I search for
> "source": * "hello word" * or url =
> http://XXXX:8983/solr/AWP10/select?Indent=on&q=source:*%22hello%20world%22*&wt=json
If you index source as string (single token) you can search with 
wildcards, but you have to escape spaces - source: *hello\ word*
or can use regex - source:/.*hello word.*/
If you index it as text, it will be tokenized and it will have tokens 
"hello" and "word" and then you can use phrase query - source: "hello 
word" - this is recommended way.
>
> For example, one line of the answer:
>     "source":
> ["http://www.gravatar.com/avatar/ad516503a11cd5ca435acc9bb6523536?s=32"]
>
> The expression does not appear and even then the line is returned.
you can use debugQuery=true to see how query is parsed - the one you 
sent uses match all on default field.
>
> 2 - My idea was to identify a url in the middle of a string with regex, for
> example, as it does in Java:
> Eur-lex.europa.eu eur-lex.europa.eu eur-lex.europa.eu Eur-lex.europa.eu
> eur-lex.europa.eu
> I do not know what the syntax is for entering regex in the search.
The proper way is to use analysis to split url into tokens and then to 
search for exact match. Analysis could include:
1. changing / with space
2. white space tokenizer
3. removing 'www.'
4. ignoring http
...
>
> 3- I can use the multiplication function, but not the search syntax to
> evaluate its return.
Again, if you always query product of the same fields, you might want to 
create field containing that value (e.g. field prod) and then use range 
query - prod:[10 TO 20]

If you have two numeric fields (e.g. a and b) you can filter out doc 
using frange in filter query:
   fg={!frange l=10 u=20}product(a, b)
if you need to return that value you need to add it to fl:
   fl=*,prod:product(a,b)
this will return all stored fields and product as 'prod'.

HTH,
Emir
>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Search-substring-in-field-tp4333553p4334316.html
> Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message