lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Leir <rl...@leirtech.com>
Subject Re: How do I create a schema file for FIX data in Solr
Date Mon, 02 Apr 2018 13:15:16 GMT
Google 
   fix to json, 
there are a few interesting leads.

On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xie3208080@gmail.com> wrote:
>Thank you, Shawn, Rick and other readers,
>
>To Shawn:
>
>For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>means BeginString, in this example, its value is  FIX.4.4.9, and 9
>means
>body length, it is 653 for this message, 35 is RIO, meaning the message
>type is RIO, 122 stands for OrigSendingTime and has a format of
>UTCTimestamp
>
>You can refer to this page for details: https://www.onixs.biz
>/fix-dictionary/4.2/fields_by_tag.html
>
>All the values are explained as string type.
>
>All the tag numbers are from FIX standard so it doesn't change (in my
>case)
>
>I expect a python program might be needed to parse the message and
>extract
>each tag's value, index is to be made on those extracted value as long
>as
>their field (tag) name.
>
>With index in place, ideally and naturally user will search for any
>keyword, however, in this case, most queries would be based on tag 37
>(Order ID) and 75 (Trade Date), there is another customized tag (not in
>the
>standard) Order Version to be queried on.
>
>I understand the parser creation would be a manual process, as long as
>I
>know or have a small sample program, I will do it myself and maybe
>adjust
>it as per need.
>
>To Rick:
>
>You mentioned creating JSON document, my understanding is a parser
>would be
>needed to generate that JSON document, do you have any existing example
>code?
>
>
>
>
>Thank you guys very much.
>
>
>
>
>
>
>
>
>
>*------------------------------------------------*
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <apache@elyograg.org>
>wrote:
>
>> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>>
>>> FIX is a format standard of financial data. It contains lots of tags
>in
>>> number with value for the tag, like 8=asdf, where 8 is the tag and
>asdf is
>>> the tag's value. Each tag has its definition.
>>>
>>> The sample msg in FIX format was in the original question.
>>>
>>> All I need to do is to know how to paste the msg and get all tag's
>value.
>>>
>>> I found so far a parser is what I need to start with., But I am more
>>> concerning about how to create index in Solr on the extracted tag's
>value,
>>> that is the first step, the next would be to customize the dashboard
>for
>>> users to search with a value to find out which msg contains that
>value in
>>> which tag and present users the whole msg as proof.
>>>
>>
>> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>API
>> that implements search functionality.  Solr bolts on some
>functionality on
>> top of Lucene, but doesn't really do anything to fundamentally change
>the
>> fact that you're dealing with a Lucene index.  So I'm going to mostly
>talk
>> about Lucene below.
>>
>> Lucene organizes data in a unit that we call a "document." An easy
>analogy
>> for this is that it is a lot like a row in a single database table. 
>It has
>> fields, each field has a type. Unless custom software is used, there
>is
>> really no support for data other than basic primitive types --
>numbers and
>> strings.  The only complex type that I can think of that Solr
>supports out
>> of the box is geospatial coordinates, and it might even support
>> multi-dimensional coordinates, but I'm not sure.  It's not all that
>complex
>> -- the field just stores and manipulates multiple numbers instead of
>one.
>> The Lucene API does support a FEW things that Solr doesn't implement.
> I
>> don't think those are applicable to what you're trying to do.
>>
>> Let's look at the first part of the data that you included in the
>first
>> message:
>>
>> 8=FIX.4.4 9=653 35=RIO
>>
>> Is "8" always a mixture of letters and numbers and periods? Is "9"
>always
>> a number, and is it always a WHOLE number?  Is "35" always letters?
>> Looking deeper to data that I didn't quote ... is "122" always a
>date/time
>> value?  Are the tag numbers always picked from a well-defined set, or
>do
>> they change?
>>
>> Assuming that the answers in the previous paragraph are found and a
>> configuration is created to deal with all of it ... how are you
>planning to
>> search it?  What kind of queries would you expect somebody to make? 
>That's
>> going to have a huge influence on how you configure things.
>>
>> Writing the schema is usually where people spend the most time when
>> they're setting up Solr.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message