lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eli chen <eli.c....@gmail.com>
Subject Re: get the position of matched word in the response
Date Sun, 04 Aug 2019 13:52:29 GMT
thx
of course they search for pharses.
and if they searched "hello monkey" and solr found "hello my monkey".  i
want to get the position of "hello" and "monkey" (they words he actually
typed in the search).

and btw thx you all but i found
https://github.com/dbmdz/solr-ocrhighlighting which i think can help me a
lot. and i'll check the payload thing (im new to solr)



‫בתאריך יום א׳, 4 באוג׳ 2019 ב-15:40 מאת ‪Alexandre Rafalovitch‬‏
<‪
arafalov@gmail.com‬‏>:‬

> What happens if they search for "hello monkey" and match against
> "hello my monkeys"? What should it return? Why does your database not
> contain "hello" instead of 199?
>
> I am saying because if your clients are truly searching for just one
> word, then Solr may be an overkill for you. Perhaps you are looking
> for just "indexOf" within a string with parallel offset->OCR data
> structure. So, there is a hidden question in there of "why do you
> choose Solr".
>
> Then, there is a point that Solr searches words/numbers/geo-spacial
> but returns documents. So, sometimes, you need to understand what is a
> "document" for your business case. And transform your content for
> that. E.g. if you are really just searching for one word, then maybe
> you index your whole book as a bunch of document each containing a
> word, its OCR offset information, its book id. And if it is a couple
> of words, maybe you have a secondary field with context of that
> sentence (in index-only) form.
>
> Don't be afraid to abandon your first schema. Your business
> requirement is different enough.
>
> Regards,
>    Alex.
>
>
> On Sun, 4 Aug 2019 at 07:46, eli chen <eli.c.new@gmail.com> wrote:
> >
> > every content field is actually a book content
> > so let say someone search for the word "hello" and i found this word in
> the
> > book "the story jungle" at position 199 (step by word not char)
> >
> > now i can look at my database and check the OCR of this word in this book
> > (and show highlight on the picture and etc)
> >
> > my db is kinda of (just for simplicity)
> >
> > book     word     ocr
> > ------     -------     ---------
> > th....     199        1,1,1,1
> >
> > that the reason i need the offest of the word.
> >
> > and btw the content field is just a big text_general field
> >
> > thx again
> >
> > ‫בתאריך יום א׳, 4 באוג׳ 2019 ב-14:30 מאת ‪Erick Erickson‬‏
<‪
> > erickerickson@gmail.com‬‏>:‬
> >
> > > Eli:
> > >
> > > What problem are you trying to solve? There’s no really convenient way
> to
> > > do this that know of, although it could be done, probably with some
> > > lucene-level code.
> > >
> > > This may be an XY problem, where you're asking how to do X (find the
> > > position of the matched word) because you think it’ll help solve some
> > > problem Y. What’s “Y”? Perhaps there’s an easier way to solve that
> problem
> > > if we knew what it was….
> > >
> > > Best,
> > > Erick
> > >
> > > > On Aug 4, 2019, at 6:55 AM, eli chen <eli.c.new@gmail.com> wrote:
> > > >
> > > > hi i'm new to solr so please be patient.
> > > > how can i get the position of matched word in the results.
> > > >
> > > > and no, im not talking about highlighting the words. i talkng about
> > > getting
> > > > the postition of the word in the content
> > > >
> > > > i have field content which i do in q=content:"some_word"
> > > >
> > > > the content field is not stored but its
> > > > Indexed +Tokenized+ Multivalued+ TermVector Stored +Store Offset With
> > > > TermVector +Store Position With TermVector
> > > >
> > > > thx for the help
> > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message