lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Fodor (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-2202) Money FieldType
Date Wed, 27 Oct 2010 17:21:20 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925466#action_12925466
] 

Greg Fodor edited comment on SOLR-2202 at 10/27/10 1:20 PM:
------------------------------------------------------------

A few questions, now that I've done a bit more research and thinking:

- First, currency parsing in Java appears locale-dependent (which obviously makes sense.)
The concern here is that the locale of the end-user performing queries is likely not the same
as the locale of the search engine. Is there currently a standard mechanism in Solr to acquire
the user's locale? What do we do for other internationalized components?

- NumberFormat parsing fails to parse "10.00USD", or "10.00 USD", instead relying upon the
symbol. ("$10.00"). This seems like a limitation since generally using the currency code as
suffix is a locale-independent way of specifying a monetary value. It may very well be a good
idea to simply standardize on this approach for the purposes of indexing, and avoid all the
locale-specific issues that come up regarding currency symbols.

- The NumberFormat parsing does not yield back the currency, just the value. It seems the
currency itself still needs to be extracted somehow. Is there a built in mechanism to do this?
Currently the patch iterates over all currencies attempting to extract the symbol or code
from the value.

- How important is it that users have control over the currencies table? It was quite useful
to have the ability to define fake currencies for testing (as is done in the example currency.xml
file), it seems that if I changed the implementation to use Java's currency table this might
be a limitation if non-testing oriented use-cases exist.

- I wanted to know in more detail what rounding related errors, if any, I need to be concerned
with. You'll notice in the patch that the range query applies an EPSILON on the edges to avoid
floating point equality issues, and the point query actually executes a range query. Are there
additional problems I need to address? It seems there will always be some margin of error
when exchange rates are being applied since this requires floating point multiplications of
values in the index at execution time. 

- Looking further I'm not really sure I understand how the TrieField can benefit me here.
It seems that an entire iteration through the ValueSource is necessary for range queries,
as conversion rates may dictate that the minimum and maximum absolute value documents need
to be visited.

- Right now the name and plural name, etc are unused. It definitely will make sense to remove
or incorporate Java's native APIs to get those if they end up being needed, however.

Thanks again for reviewing this patch!

      was (Author: gfodor):
    A few questions, now that I've done a bit more research and thinking:

- First, currency parsing in Java appears locale-dependent (which obviously makes sense.)
The concern here is that the locale of the end-user performing queries is likely not the same
as the locale of the search engine. Is there currently a standard mechanism in Solr to acquire
the user's locale? What do we do for other internationalized components?

- NumberFormat parsing fails to parse "10.00USD", or "10.00 USD", instead relying upon the
symbol. ("$10.00"). This seems like a limitation since generally using the currency code as
suffix is a locale-independent way of specifying a monetary value. It may very well be a good
idea to simply standardize on this approach for the purposes of indexing, and avoid all the
locale-specific issues that come up regarding currency symbols.

- The NumberFormat parsing does not yield back the currency, just the value. It seems the
currency itself still needs to be extracted somehow. Is there a built in mechanism to do this?
Currently the patch iterates over all currencies attempting to extract the symbol or code
from the value.

- How important is it that users have control over the currencies table? It was quite useful
to have the ability to define fake currencies for testing (as is done in the example currency.xml
file), it seems that if I changed the implementation to use Java's currency table this might
be a limitation if non-testing oriented use-cases exist.

- I wanted to know in more detail what rounding related errors, if any, I need to be concerned
with. You'll notice in the patch that the range query applies an EPSILON on the edges to avoid
floating point equality issues, and the point query actually executes a range query. Are there
additional problems I need to address? It seems there will always be some margin of error
when exchange rates are being applied since this requires floating point multiplications of
values in the index at execution time. 

- Looking further I'm not really sure I understand how the TrieField can benefit me here.
It seems that an entire iteration through the ValueSource is necessary for range queries,
as conversion rates may dictate that the minimum and maximum absolute value documents need
to be visited.

Thanks again for reviewing this patch!
  
> Money FieldType
> ---------------
>
>                 Key: SOLR-2202
>                 URL: https://issues.apache.org/jira/browse/SOLR-2202
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>    Affects Versions: 1.5
>            Reporter: Greg Fodor
>         Attachments: SOLR-2202-lucene-1.patch, SOLR-2202-solr-1.patch, SOLR-2202-solr-2.patch
>
>
> Attached please find patches to add support for monetary values to Solr/Lucene with query-time
currency conversion. The following features are supported:
> - Point queries (ex: "price:4.00USD")
> - Range quries (ex: "price:[$5.00 TO $10.00]")
> - Sorting.
> - Currency parsing by either currency code or symbol.
> - Symmetric & Asymmetric exchange rates. (Asymmetric exchange rates are useful if
there are fees associated with exchanging the currency.)
> At indexing time, money fields can be indexed in a native currency. For example, if a
product on an e-commerce site is listed in Euros, indexing the price field as "10.00EUR" will
index it appropriately. By altering the currency.xml file, the sorting and querying against
Solr can take into account fluctuations in currency exchange rates without having to re-index
the documents.
> The new "money" field type is a polyfield which indexes two fields, one which contains
the amount of the value and another which contains the currency code or symbol. The currency
metadata (names, symbols, codes, and exchange rates) are expected to be in an xml file which
is pointed to by the field type declaration in the schema.xml.
> The current patch is factored such that Money utility functions and configuration metadata
lie in Lucene (see MoneyUtil and CurrencyConfig), while the MoneyType and MoneyValueSource
lie in Solr. This was meant to mirror the work being done on the spacial field types.
> This patch has not yet been deployed to production but will be getting used to power
the international search capabilities of the search engine at Etsy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message