incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Column Types
Date Mon, 21 Oct 2013 18:41:04 GMT
On Mon, Oct 21, 2013 at 2:25 PM, Colton McInroy <colton@dosarrest.com>wrote:

>
> Thanks,
> Colton McInroy
>
>  * Director of Security Engineering
>
>
> Phone
> (Toll Free)
> _US_    (888)-818-1344 Press 2
> _UK_    0-800-635-0551 Press 2
>
> My Extension    101
> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> Website         http://www.dosarrest.com
>
> On 10/21/2013 11:15 AM, Aaron McCurry wrote:
>
>> On Mon, Oct 21, 2013 at 1:37 PM, Colton McInroy <colton@dosarrest.com
>> >wrote:
>>
>>  Hmm... What about column families?
>>>
>>> I tried typing this in the blur shell "definecolumn Program_syslog-ng
>>> event Date date" but it doesn't appear to change anything. When I do
>>> schema
>>> <table> I still see this...
>>>
>>> family : event
>>>          column   : Date
>>>                  fieldType : text
>>>
>>>  Once a type is defined it cannot be changed, if there was no error
>> message
>> then we should fix that.  If you set a table to strictTypes = true then a
>> column can not be defined automatically via mutate.  Only through adding a
>> column definition.
>>
> Ok, so that would have been my problem, I have data in the table already,
> so the mutate calls created the schema it appears. If I set strictTypes to
> true will all fields be treated as text unless a column type is specified,
> or do I HAVE to specify column types for every field?


Yes, however with the API changes that have/are being discussed in others
threads (Document vs. Record, Document Collection vs.Row, etc) I want to
change the value portion of the Column to have a Value type that will be a
union in Thrift instead of a struct.  This would allow us to have all the
basic types be defined in separate fields.  This stringValue for string
types, textValue for text types, intValue for int types, etc that way when
a table is not in strict mode it could better guess the correct type
instead of blindly choosing text.


>
>
>> http://incubator.apache.org/**blur/docs/0.2.0/Blur.html#**
>> Struct_TableDescriptor<http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_TableDescriptor>
>>
>> The date type also needs an extra argument how to parser the string.
>>
>> definecolumn Program_syslog-ng event Date date -p dateFormat yyyyMMdd
>>
> Ah, ok thanks.
>
>
>> http://incubator.apache.org/**blur/docs/0.2.0/data-model.**html#date_type<http://incubator.apache.org/blur/docs/0.2.0/data-model.html#date_type>
>>
>>
>>  Am I do something wrong? Does the schema have to be set during
>>> initialization of the table, or can it be done at any time?
>>>
>>>  It can be done at anytime, but once the field has been used it set for
>> the
>> lifetime of the table.
>>
>>
>>  Also, the code you posted for a single table doesn't reference column
>>> families at all. Are field types column name specific only, so if you
>>> have
>>> the same column name in two different families both will be handled by
>>> that
>>> field type? No problem for me at all if this is the case, but for some
>>> people it may be a problem. Say for instance they have doc.matches:true
>>> and
>>> fields.matches:3, it may cause some problems.
>>>
>>>  I might be misunderstanding your question, but when you define a column
>> to
>> a type with the definecolumn command you have to supply a family and a
>> column name.  In your example above that would be "event" for family and
>> "Date" for the column name.  For type definition they act like a bridge
>> between families/columns to field in lucene.  So the getFieldsForColumn
>> and
>> getFieldsForSubColumn methods on FieldTypeDefinition, it gets passed a
>> Column with the family name and an Iterable of Fields are returned.
>>
>> http://incubator.apache.org/**blur/docs/0.2.0/site/blur-**
>> query/apidocs/org/apache/blur/**analysis/FieldTypeDefinition.**html<http://incubator.apache.org/blur/docs/0.2.0/site/blur-query/apidocs/org/apache/blur/analysis/FieldTypeDefinition.html>
>>
> Yes, in the above definecolumn command I had to specify the column family,
> but in your code you pasted for a single table, there was none, so that
> kind of confused me and I am trying to get clarification on that...
>
> tableDescriptor.****putToTableProperties("blur.****
> fieldtype.customfield1",
>      "org.apache.blur.analysis.****type.ExampleType1");
> tableDescriptor.****putToTableProperties("blur.****
> fieldtype.customfield2",
>      "org.apache.blur.analysis.****type.ExampleType2");
>
> Unless perhaps customfield1 would be family and column name combined, like
> "event.Date" or something?
>

Ok so this code is to merely add the type to be available as a type that
can be used.  After this runs your new type will act just like "text",
"int", "date", or any other built in type.  After the type is registered in
the system, either by table or system wide you will still need to call
definecolumn to make use of the new type.

So in the example.

tableDescriptor.putToTableProperties("blur.fieldtype.customfield1",
"org.apache.blur.analysis.type.ExampleType1");

"blur.fieldtype." is the important part for the loader.  The prefix tells
the TableContext that this property is a new field type.  So it takes the
value "org.apache.blur.analysis.type.ExampleType1" for example and tries to
load the class via the Class.forName method.  If successfully and if it's a
FieldTypeDefinition it will register is in the BaseFieldManager.  Then the
type is available for use.

So "customfield1" is not even used.  It's only there to makes the property
be a unique name.

Hope this helps.

Aaron


>
>>
>>  I took a look at the ExampleType.java as well as the other current types
>>> and he really helped. I may write up a IP type definition for my own use
>>> as
>>> well as submit it to you for inclusion in apache blur if that it is
>>> desired. I know I probably won't be the only one to want that column
>>> type.
>>>
>>
>> That would great!  Thanks.
>>
>> Aaron
>>
>>
>>
>>>
>>> Thanks,
>>> Colton McInroy
>>>
>>>   * Director of Security Engineering
>>>
>>>
>>> Phone
>>> (Toll Free)
>>> _US_    (888)-818-1344 Press 2
>>> _UK_    0-800-635-0551 Press 2
>>>
>>> My Extension    101
>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>> Website         http://www.dosarrest.com
>>>
>>> On 10/21/2013 6:10 AM, Aaron McCurry wrote:
>>>
>>>  The feature is not in 0.2.0 it is in 0.2.1 and 0.3.0.
>>>>
>>>> Here's the issue.
>>>>
>>>> https://issues.apache.org/****jira/browse/BLUR-258<https://issues.apache.org/**jira/browse/BLUR-258>
>>>> <https://**issues.apache.org/jira/browse/**BLUR-258<https://issues.apache.org/jira/browse/BLUR-258>
>>>> >
>>>>
>>>>
>>>> I haven't pushed a 0.2.1 website for documentation yet.  But the basics
>>>> are
>>>> create your type from FieldTypeDefinition or one of the other FTD
>>>> classes
>>>> by extending them.
>>>>
>>>> Then to use the custom type, you can either add your custom type to the
>>>> entire cluster or per table.
>>>>
>>>> For Cluster Wide
>>>>
>>>> For cluster wide configuration you will need to add the new field types
>>>> into the blur-site.properties file on each server.
>>>>
>>>> blur.fieldtype.customfield1=****org.apache.blur.analysis.type.****
>>>> ExampleType1
>>>> blur.fieldtype.customfield2=****org.apache.blur.analysis.type.****
>>>>
>>>> ExampleType2
>>>> ...
>>>>
>>>> Please note that the prefix of "blur.fieldtype." is all that is used
>>>> from
>>>> the property name because the type gets it's name from the internal
>>>> method
>>>> of "getName". However the property names will need to be unique within
>>>> the
>>>> file.
>>>>
>>>> For Single Table
>>>>
>>>> For a single table configuration you will need to add the new field
>>>> types
>>>> into the tableProperties map in the TableDescriptor as you define the
>>>> table.
>>>>
>>>> tableDescriptor.****putToTableProperties("blur.****
>>>> fieldtype.customfield1",
>>>>       "org.apache.blur.analysis.****type.ExampleType1");
>>>> tableDescriptor.****putToTableProperties("blur.****
>>>> fieldtype.customfield2",
>>>>       "org.apache.blur.analysis.****type.ExampleType2");
>>>>
>>>> ...
>>>>
>>>> Please note that the prefix of "blur.fieldtype." is all that is used
>>>> from
>>>> the property name because the type gets it's name from the internal
>>>> method
>>>> of "getName". However the property names will need to be unique within
>>>> the
>>>> map.
>>>>
>>>> Aaron
>>>>
>>>>
>>>>
>>>> On Sun, Oct 20, 2013 at 10:59 PM, Colton McInroy <colton@dosarrest.com
>>>>
>>>>> wrote:
>>>>>
>>>>   I noticed in the source the following column types are documented...
>>>>
>>>>>     /**
>>>>>      * The field type for the column.  The built in types are:
>>>>>      * <ul>
>>>>>      * <li>text - Full text indexing.</li>
>>>>>      * <li>string - Indexed string literal</li>
>>>>>      * <li>int - Converted to an integer and indexed numerically.</li>
>>>>>      * <li>long - Converted to an long and indexed numerically.</li>
>>>>>      * <li>float - Converted to an float and indexed numerically.</li>
>>>>>      * <li>double - Converted to an double and indexed
>>>>> numerically.</li>
>>>>>      * <li>stored - Not indexed, only stored.</li>
>>>>>      * </ul>
>>>>>      */
>>>>>
>>>>> When I was looking at blur-query/src/main/java/org/*****
>>>>>
>>>>> *apache/blur/analysis/
>>>>> **BaseFieldManager.java I came across this though...
>>>>>
>>>>> # grep addColumnDefinition blur-query/src/main/java/org/******
>>>>> apache/blur/analysis/******BaseFieldManager.java
>>>>>           addColumnDefinition(family, name, null,
>>>>> getDefaultMissingFieldLessInde******xing(),
>>>>> getDefaultMissingFieldType(),
>>>>>
>>>>>
>>>>>     public boolean addColumnDefinition(String family, String
>>>>> columnName,
>>>>> String subColumnName, boolean fieldLessIndexed,
>>>>>     public void addColumnDefinitionGisPointVec******tor(String family,
>>>>>
>>>>> String
>>>>>
>>>>> columnName) throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> SpatialPointVectorStrategyFiel******dTypeDefinition.NAME, null);
>>>>>     public void addColumnDefinitionGisRecursiv******ePrefixTree(String
>>>>>
>>>>> family,
>>>>>
>>>>> String columnName) throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> SpatialRecursivePrefixTreeStra******tegyFieldTypeDefinition.****NAME,
>>>>>
>>>>>
>>>>>     public void addColumnDefinitionDate(String family, String
>>>>> columnName,
>>>>> String format) throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> DateFieldTypeDefinition.NAME, props);
>>>>>     public void addColumnDefinitionInt(String family, String
>>>>> columnName)
>>>>> throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> IntFieldTypeDefinition.NAME, null);
>>>>>     public void addColumnDefinitionLong(String family, String
>>>>> columnName)
>>>>> throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> LongFieldTypeDefinition.NAME, null);
>>>>>     public void addColumnDefinitionFloat(******String family, String
>>>>>
>>>>>
>>>>> columnName) throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> FloatFieldTypeDefinition.NAME, null);
>>>>>     public void addColumnDefinitionDouble(******String family, String
>>>>>
>>>>>
>>>>> columnName) throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> DoubleFieldTypeDefinition.******NAME, null);
>>>>>     public void addColumnDefinitionString(******String family, String
>>>>>
>>>>>
>>>>> columnName) throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> StringFieldTypeDefinition.******NAME, null);
>>>>>
>>>>>
>>>>>     public void addColumnDefinitionText(String family, String
>>>>> columnName)
>>>>> throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, false,
>>>>> TextFieldTypeDefinition.NAME, null);
>>>>>     public void addColumnDefinitionTextFieldLe******ss(String family,
>>>>>
>>>>> String
>>>>>
>>>>> columnName) throws IOException {
>>>>>       addColumnDefinition(family, columnName, null, true,
>>>>> TextFieldTypeDefinition.NAME, null);
>>>>>
>>>>> I am wondering how to specify these. I would like to programmatically
>>>>> set
>>>>> column types in certain situations, and I would like to be able to use
>>>>> the
>>>>> Date column type. Which I have been meaning to ask about....
>>>>>
>>>>> What is the best way to store a timestamp? What format, column type,
>>>>> etc... I'm guessing the Date column type, but I do not know how to set
>>>>> it
>>>>> right now. I noticed that the client (Iface object) has a
>>>>> addColumnDefinition, but it has different parameters than the above
>>>>> addColumnDefinition, and it's missing all of the ones for the different
>>>>> column types.
>>>>>
>>>>> I have one additional field type I would like to see, which is one for
>>>>> IP
>>>>> addresses...
>>>>>
>>>>>      * <li>date - Converted to a date and indexing.</li>
>>>>>      * <li>text - Full text indexing.</li>
>>>>>      * <li>string - Indexed string literal</li>
>>>>>      * <li>int - Converted to an integer and indexed numerically.</li>
>>>>>      * <li>long - Converted to an long and indexed numerically.</li>
>>>>>      * <li>float - Converted to an float and indexed numerically.</li>
>>>>>      * <li>double - Converted to an double and indexed
>>>>> numerically.</li>
>>>>>      * <li>ip - Converted to a InetAddress and indexed
>>>>> numerically.</li>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Colton McInroy
>>>>>
>>>>>    * Director of Security Engineering
>>>>>
>>>>>
>>>>> Phone
>>>>> (Toll Free)
>>>>> _US_    (888)-818-1344 Press 2
>>>>> _UK_    0-800-635-0551 Press 2
>>>>>
>>>>> My Extension    101
>>>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>>> Website         http://www.dosarrest.com
>>>>>
>>>>>
>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message