lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Questions on The Tagger Handler
Date Thu, 02 Aug 2018 04:19:44 GMT
Hi Alexandre,

I have found that the ConcatenateGraphFilterFactory at the end of the
indexing chain and it will merge the tokens back into a single field (even
though the Standard Tokenizer has split it, which is what we use for normal
phrase search), so I believe it is due to this that when I search for
"Hello New York City" or "Hello New York", it is not able to match "New
York City".

So this is the correct way that the Tagger Handler works?

Regards,
Edwin

On Thu, 2 Aug 2018 at 11:41, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> You have "Hello New York City" as both working and non-working
> example. I am not sure what specifically is an issue.
>
> In general, you have processing on both indexing and query and then
> the tokens must match in the right order. Just like a normal phrase
> search, but in reverse.
>
> Regards,
>    Alex.
>
> On 1 August 2018 at 22:13, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> wrote:
> > Hi Alexandre,
> >
> > Thanks for the information.
> >
> > I found that it is able to retrieve the record if I search for "Hello New
> > York City" or "New York City".
> > However, I am not able to retrieve it if I search for "Hello New York
> City"
> > or "Hello New York".
> > Is that the right behavior?
> >
> > Regards,
> > Edwin
> >
> > On Wed, 1 Aug 2018 at 22:13, Alexandre Rafalovitch <arafalov@gmail.com>
> > wrote:
> >
> >> You may find this interesting:
> >>
> >>
> https://slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks/
> >> Specifically, slides 15-18.
> >>
> >> Basically, it is a reverse from normal search. You are searching for
> >> occurrences of the already indexed terms (here, the place names) in
> >> the text you sent. And it returns information about what it found and
> >> where in your original text it is (the offsets). The text you send to
> >> the tagger does not end up in Solr.
> >>
> >> What is missing is a good visualization of what it found. Which would
> >> be a bit like highlighter, taking those offsets and applying them to
> >> the original text.
> >>
> >> Regards,
> >>    Alex.
> >>
> >> On 1 August 2018 at 05:59, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I am trying out the Tagger Handler in Solr 7.4.0 by following the
> >> tutorial
> >> > from
> >> >
> >>
> https://lucene.apache.org/solr/guide/7_4/the-tagger-handler.html#tutorial-with-geonames
> >> >
> >> > I have managed to set it up to work, but what I do not really
> understand
> >> is
> >> > how to analyse the output. From the example, it seems to be trying to
> tag
> >> > 'Hello New York City', and it returns one output. This seems more like
> >> > searching for the 'name' field (in the example, the 'name' field is
> >> copied
> >> > to the 'name_tag' field for tagging) and getting the records with the
> >> name
> >> > "New York City".
> >> >
> >> > What is the actual purpose of doing this?
> >> >
> >> > Also, what does the "startOffset" and "endOffset" means, and how the
> >> value
> >> > is calculated?
> >> >
> >> > {
> >> >   "responseHeader":{
> >> >     "status":0,
> >> >     "QTime":1},
> >> >   "tagsCount":1,
> >> >   "tags":[[
> >> >       "startOffset",6,
> >> >       "endOffset",19,
> >> >       "ids",["5128581"]]],
> >> >   "response":{"numFound":1,"start":0,"docs":[
> >> >       {
> >> >         "id":"5128581",
> >> >         "name":["New York City"],
> >> >         "countrycode":["US"]}]
> >> >   }}
> >> >
> >> >
> >> > Regards,
> >> > Edwin
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message