lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Paryload example
Date Wed, 23 Oct 2019 12:25:44 GMT
Bookmarked. Do you intend that this should be incorporated into Solr? If so, please raise a
JIRA and link your PR in….

Thanks!
Erick

> On Oct 22, 2019, at 6:56 PM, Vincenzo D'Amore <v.damore@gmail.com> wrote:
> 
> Hi all,
> 
> this evening I had some spare hour to spend in order to put everything
> together in a repository.
> 
> https://github.com/freedev/solr-payload-string-function-query
> 
> 
> 
> On Tue, Oct 22, 2019 at 5:54 PM Vincenzo D'Amore <v.damore@gmail.com> wrote:
> 
>> Hi all,
>> 
>> thanks for supporting. And many thanks whom have implemented
>> the integration of the github Solr repository with the intellij IDE.
>> To configure the environment and run the debugger I spent less than one
>> hour, (and most of the time I had to wait the compilation).
>> Solr and you guys really rocks together.
>> 
>> What I've done:
>> 
>> I was looking at the original payload function is defined into
>> the ValueSourceParser, this function uses a FloatPayloadValueSource to
>> return the value found.
>> 
>> As said I wrote a new version of payload function that handles strings, I
>> named it spayload, and basically is able to extract the string value from
>> the payload.
>> 
>> Given the former example where I have a multivalue field payloadCurrency
>> 
>> payloadCurrency: [
>> "store1|USD",
>> "store2|EUR",
>> "store3|GBP"
>> ]
>> 
>> executing spayload(payloadCurrency,store2) returns "EUR", and so on for
>> the remaining key/value in the field.
>> 
>> To implement the spayload function, I've added a new ValueSourceParser
>> instance to the list of defined functions and which returns
>> a StringPayloadValueSource with the value inside (does the same thing of
>> former FloatPayloadValueSource).
>> 
>> That's all. As said, always beware of your code when works at first run.
>> And really there was something wrong, initially I messed up in the
>> conversion of the payload into String (bytes, offset, etc).
>> Now it is fixed, or at least it seems to me.
>> I see this function cannot be used in the sort, very likely the simple
>> implementation of the StringPayloadValueSource miss something.
>> 
>> As far as I understand I'm scratching the surface of this solution, there
>> are few things I'm worried about. I have a bunch of questions, please be
>> patient.
>> This function returns an empty string "" when does not match any key, or
>> should return an empty value? not sure about, what's the correct way to
>> return an empty value?
>> I wasn't able to find a test unit for the payload function in the tests.
>> Could you give me few suggestion in order to test properly the
>> implementation?
>> In case the spayload is used on a different field type (i.e. the use
>> spayload on a float payload) the behaviour is not handled. Can this
>> function check the type of the payload content?
>> And at last, what do you think, can this simple fix be interesting for the
>> Solr community, may I try to submit a pull request or add a feature to JIRA?
>> 
>> Best regards,
>> Vincenzo
>> 
>> 
>> On Mon, Oct 21, 2019 at 9:12 PM Erik Hatcher <erik.hatcher@gmail.com>
>> wrote:
>> 
>>> Yes.   The decoding of a payload based on its schema type is what the
>>> payload() function does.   Your Payloader won't currently work well/legibly
>>> for fields encoded numerically:
>>> 
>>> 
>>> https://github.com/o19s/payload-component/blob/master/src/main/java/com/o19s/payloads/Payloader.java#L130
>>> <
>>> https://github.com/o19s/payload-component/blob/master/src/main/java/com/o19s/payloads/Payloader.java#L130
>>>> 
>>> 
>>> I think that code could probably be slightly enhanced to leverage
>>> PayloadUtils.getPayloadDecoder(fieldType) and use bytes if the field type
>>> doesn't have a better decoder.
>>> 
>>>        Erik
>>> 
>>> 
>>>> On Oct 21, 2019, at 2:55 PM, Eric Pugh <epugh@opensourceconnections.com>
>>> wrote:
>>>> 
>>>> Have you checked out
>>>> https://github.com/o19s/payload-component
>>>> 
>>>> On Mon, Oct 21, 2019 at 2:47 PM Erik Hatcher <erik.hatcher@gmail.com>
>>> wrote:
>>>> 
>>>>> How about a single field, with terms like:
>>>>> 
>>>>>   store1_USD|125.0 store2_EUR|220.0 store3_GBP|225.0
>>>>> 
>>>>> Would that do the trick?
>>>>> 
>>>>> And yeah, payload decoding is currently limited to float and int with
>>> the
>>>>> built-in payload() function.   We'd need a new way to pull out
>>>>> textual/bytes payloads - like maybe a DocTransformer?
>>>>> 
>>>>>       Erik
>>>>> 
>>>>> 
>>>>>> On Oct 21, 2019, at 9:59 AM, Vincenzo D'Amore <v.damore@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi Erick,
>>>>>> 
>>>>>> thanks for getting back to me. We started to use payloads because
we
>>> have
>>>>>> the classical per-store pricing problem.
>>>>>> Thousands of stores across and different prices.
>>>>>> Then we found the payloads very useful started to use it for many
>>>>> reasons,
>>>>>> like enabling/disabling the product for such store, save the stock
>>>>>> availability, or save the other info like buy/sell price, discount
>>> rates,
>>>>>> and so on.
>>>>>> All those information are numbers, but stores can also be in different
>>>>>> countries, I mean would be useful also have the currency and other
>>>>>> attributes related to the store.
>>>>>> 
>>>>>> Thinking about an alternative for payloads maybe I could use the
>>> dynamic
>>>>>> fields, well, I know it is ugly.
>>>>>> 
>>>>>> Consider this hypothetical case where I have two field payload :
>>>>>> 
>>>>>> payloadPrice: [
>>>>>> "store1|125.0",
>>>>>> "store2|220.0",
>>>>>> "store3|225.0"
>>>>>> ]
>>>>>> 
>>>>>> payloadCurrency: [
>>>>>> "store1|USD",
>>>>>> "store2|EUR",
>>>>>> "store3|GBP"
>>>>>> ]
>>>>>> 
>>>>>> with dynamic fields I could have different fields for each document.
>>>>>> 
>>>>>> currency_store1_s: "USD"
>>>>>> currency_store2_s: "EUR"
>>>>>> currency_store3_s: "GBP"
>>>>>> 
>>>>>> But how many dynamic fields like this can I have? more than thousands?
>>>>>> 
>>>>>> Again, I've just started to look at solr-ocrhighlighting github
>>> project
>>>>> you
>>>>>> suggested.
>>>>>> Those seems have written their own payload object type where store
ocr
>>>>>> highlighting information.
>>>>>> It seems interesting, I'll take a look immediately.
>>>>>> 
>>>>>> Thanks again for your time.
>>>>>> 
>>>>>> Best regards,
>>>>>> Vincenzo
>>>>>> 
>>>>>> 
>>>>>> On Mon, Oct 21, 2019 at 2:55 PM Erick Erickson <
>>> erickerickson@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> This is one of those situations where I know a client did it,
but
>>> didn’t
>>>>>>> see the code myself.
>>>>>>> 
>>>>>>> So I can’t help much.
>>>>>>> 
>>>>>>> Perhaps a good question at this point, though, is “why do you
want to
>>>>> add
>>>>>>> string payloads anyway”?
>>>>>>> 
>>>>>>> This isn’t the client, but it might give you some pointers:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://github.com/dbmdz/solr-ocrpayload-plugin/blob/master/src/main/java/de/digitalcollections/solr/plugin/components/ocrhighlighting/OcrHighlighting.java
>>>>>>> 
>>>>>>> Best,
>>>>>>> Erick
>>>>>>> 
>>>>>>>> On Oct 21, 2019, at 6:37 AM, Vincenzo D'Amore <v.damore@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Erick,
>>>>>>>> 
>>>>>>>> It seems I've reached a dead-point, or at least it seems
looking at
>>> the
>>>>>>>> code, it seems I can't  easily add a custom decoder:
>>>>>>>> 
>>>>>>>> Looking at PayloadUtils class there is getPayloadDecoder
method
>>> invoked
>>>>>>> to
>>>>>>>> return the PayloadDecoder :
>>>>>>>> 
>>>>>>>> public static PayloadDecoder getPayloadDecoder(FieldType
fieldType)
>>> {
>>>>>>>> PayloadDecoder decoder = null;
>>>>>>>> 
>>>>>>>> String encoder = getPayloadEncoder(fieldType);
>>>>>>>> 
>>>>>>>> if ("integer".equals(encoder)) {
>>>>>>>>   decoder = (BytesRef payload) -> payload == null ? 1
:
>>>>>>>> PayloadHelper.decodeInt(payload.bytes, payload.offset);
>>>>>>>> }
>>>>>>>> if ("float".equals(encoder)) {
>>>>>>>>   decoder = (BytesRef payload) -> payload == null ? 1
:
>>>>>>>> PayloadHelper.decodeFloat(payload.bytes, payload.offset);
>>>>>>>> }
>>>>>>>> // encoder could be "identity" at this point, in the case
of
>>>>>>>> DelimitedTokenFilterFactory encoder="identity"
>>>>>>>> 
>>>>>>>> // TODO: support pluggable payload decoders?
>>>>>>>> 
>>>>>>>> return decoder;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> Any advice to work around this situation?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Oct 21, 2019 at 1:51 AM Erick Erickson <
>>>>> erickerickson@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> You’d need to write one. Payloads are generally intended
to hold
>>>>>>> numerics
>>>>>>>>> you can then use in a function query to factor into the
score…
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Erick
>>>>>>>>> 
>>>>>>>>>> On Oct 20, 2019, at 4:57 PM, Vincenzo D'Amore <v.damore@gmail.com
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Sorry, I just realized that I was wrong in how I'm
using the
>>> payload
>>>>>>>>>> function.
>>>>>>>>>> Give that the payload function only handles a numeric
(integer or
>>>>>>> float)
>>>>>>>>>> payload, could you suggest me an alternative function
that handles
>>>>>>>>> strings?
>>>>>>>>>> If not, should I write one?
>>>>>>>>>> 
>>>>>>>>>> On Sun, Oct 20, 2019 at 10:43 PM Vincenzo D'Amore
<
>>>>> v.damore@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> I'm trying to understand what I did wrong with
a payload query
>>> that
>>>>>>>>>>> returns
>>>>>>>>>>> 
>>>>>>>>>>> error: {
>>>>>>>>>>> metadata: [ "error-class",
>>> "org.apache.solr.common.SolrException",
>>>>>>>>>>> "root-error-class", "org.apache.solr.common.SolrException"
],
>>>>>>>>>>> msg: "No payload decoder found for field: colorCode",
>>>>>>>>>>> code: 400
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> I have reduced my problem in a little sample
to show what
>>> happens to
>>>>>>> me.
>>>>>>>>>>> Basically I have a document with a couple of
payload fields one
>>>>>>>>>>> delimited_payloads_string and one delimited_payloads_integer
>>>>>>>>>>> 
>>>>>>>>>>> {
>>>>>>>>>>> field_dps: "key|data",
>>>>>>>>>>> field_dpi: "key|1",
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> When I execute this query solr returns as expected
the payload
>>> for
>>>>> the
>>>>>>>>> key
>>>>>>>>>>> 
>>>>>>>>>>> q=*:*&fl=payload(field_dpi,key)
>>>>>>>>>>> 
>>>>>>>>>>> {
>>>>>>>>>>> payload(field_dpi,key): 1
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> But for the strings there have to be something
of different to
>>> do,
>>>>>>>>> because
>>>>>>>>>>> I'm unable receive the payload value back. Executing
this query,
>>> as
>>>>> in
>>>>>>>>> the
>>>>>>>>>>> short introduction of this post, I receive an
error.
>>>>>>>>>>> 
>>>>>>>>>>> ?q=*:*&fl=payload(field_dps,key)
>>>>>>>>>>> 
>>>>>>>>>>> error: {
>>>>>>>>>>> metadata: [ "error-class",
>>> "org.apache.solr.common.SolrException",
>>>>>>>>>>> "root-error-class", "org.apache.solr.common.SolrException"
],
>>>>>>>>>>> msg: "No payload decoder found for field: colorCode",
>>>>>>>>>>> code: 400
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> Am I doing something wrong? How can I read strings
payload data?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks in advance for your time,
>>>>>>>>>>> Vincenzo
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Vincenzo D'Amore
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Vincenzo D'Amore
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Vincenzo D'Amore
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Vincenzo D'Amore
>>>>> 
>>>>> 
>>> 
>>> 
>> 
>> --
>> Vincenzo D'Amore
>> 
>> 
> 
> -- 
> Vincenzo D'Amore


Mime
View raw message