lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: How do I create a schema file for FIX data in Solr
Date Sun, 01 Apr 2018 18:16:27 GMT
On 4/1/2018 10:12 AM, Raymond Xie wrote:
> FIX is a format standard of financial data. It contains lots of tags in
> number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
> the tag's value. Each tag has its definition.
>
> The sample msg in FIX format was in the original question.
>
> All I need to do is to know how to paste the msg and get all tag's value.
>
> I found so far a parser is what I need to start with., But I am more
> concerning about how to create index in Solr on the extracted tag's value,
> that is the first step, the next would be to customize the dashboard for
> users to search with a value to find out which msg contains that value in
> which tag and present users the whole msg as proof.

Most of Solr's functionality is provided by Lucene.  Lucene is a java 
API that implements search functionality.  Solr bolts on some 
functionality on top of Lucene, but doesn't really do anything to 
fundamentally change the fact that you're dealing with a Lucene index.  
So I'm going to mostly talk about Lucene below.

Lucene organizes data in a unit that we call a "document." An easy 
analogy for this is that it is a lot like a row in a single database 
table.  It has fields, each field has a type. Unless custom software is 
used, there is really no support for data other than basic primitive 
types -- numbers and strings.  The only complex type that I can think of 
that Solr supports out of the box is geospatial coordinates, and it 
might even support multi-dimensional coordinates, but I'm not sure.  
It's not all that complex -- the field just stores and manipulates 
multiple numbers instead of one.  The Lucene API does support a FEW 
things that Solr doesn't implement.  I don't think those are applicable 
to what you're trying to do.

Let's look at the first part of the data that you included in the first 
message:

8=FIX.4.4 9=653 35=RIO

Is "8" always a mixture of letters and numbers and periods? Is "9" 
always a number, and is it always a WHOLE number?  Is "35" always 
letters?  Looking deeper to data that I didn't quote ... is "122" always 
a date/time value?  Are the tag numbers always picked from a 
well-defined set, or do they change?

Assuming that the answers in the previous paragraph are found and a 
configuration is created to deal with all of it ... how are you planning 
to search it?  What kind of queries would you expect somebody to make?  
That's going to have a huge influence on how you configure things.

Writing the schema is usually where people spend the most time when 
they're setting up Solr.

Thanks,
Shawn


Mime
View raw message