incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colton McInroy <col...@dosarrest.com>
Subject Re: Column Types
Date Mon, 21 Oct 2013 18:25:50 GMT

Thanks,
Colton McInroy

  * Director of Security Engineering

	
Phone
(Toll Free) 	
_US_ 	(888)-818-1344 Press 2
_UK_ 	0-800-635-0551 Press 2

My Extension 	101
24/7 Support 	support@dosarrest.com <mailto:support@dosarrest.com>
Email 	colton@dosarrest.com <mailto:colton@dosarrest.com>
Website 	http://www.dosarrest.com

On 10/21/2013 11:15 AM, Aaron McCurry wrote:
> On Mon, Oct 21, 2013 at 1:37 PM, Colton McInroy <colton@dosarrest.com>wrote:
>
>> Hmm... What about column families?
>>
>> I tried typing this in the blur shell "definecolumn Program_syslog-ng
>> event Date date" but it doesn't appear to change anything. When I do schema
>> <table> I still see this...
>>
>> family : event
>>          column   : Date
>>                  fieldType : text
>>
> Once a type is defined it cannot be changed, if there was no error message
> then we should fix that.  If you set a table to strictTypes = true then a
> column can not be defined automatically via mutate.  Only through adding a
> column definition.
Ok, so that would have been my problem, I have data in the table 
already, so the mutate calls created the schema it appears. If I set 
strictTypes to true will all fields be treated as text unless a column 
type is specified, or do I HAVE to specify column types for every field?
>
> http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_TableDescriptor
>
> The date type also needs an extra argument how to parser the string.
>
> definecolumn Program_syslog-ng event Date date -p dateFormat yyyyMMdd
Ah, ok thanks.
>
> http://incubator.apache.org/blur/docs/0.2.0/data-model.html#date_type
>
>
>> Am I do something wrong? Does the schema have to be set during
>> initialization of the table, or can it be done at any time?
>>
> It can be done at anytime, but once the field has been used it set for the
> lifetime of the table.
>
>
>> Also, the code you posted for a single table doesn't reference column
>> families at all. Are field types column name specific only, so if you have
>> the same column name in two different families both will be handled by that
>> field type? No problem for me at all if this is the case, but for some
>> people it may be a problem. Say for instance they have doc.matches:true and
>> fields.matches:3, it may cause some problems.
>>
> I might be misunderstanding your question, but when you define a column to
> a type with the definecolumn command you have to supply a family and a
> column name.  In your example above that would be "event" for family and
> "Date" for the column name.  For type definition they act like a bridge
> between families/columns to field in lucene.  So the getFieldsForColumn and
> getFieldsForSubColumn methods on FieldTypeDefinition, it gets passed a
> Column with the family name and an Iterable of Fields are returned.
>
> http://incubator.apache.org/blur/docs/0.2.0/site/blur-query/apidocs/org/apache/blur/analysis/FieldTypeDefinition.html
Yes, in the above definecolumn command I had to specify the column 
family, but in your code you pasted for a single table, there was none, 
so that kind of confused me and I am trying to get clarification on that...

tableDescriptor.**putToTableProperties("blur.**fieldtype.customfield1",
      "org.apache.blur.analysis.**type.ExampleType1");
tableDescriptor.**putToTableProperties("blur.**fieldtype.customfield2",
      "org.apache.blur.analysis.**type.ExampleType2");

Unless perhaps customfield1 would be family and column name combined, 
like "event.Date" or something?

>
>
>> I took a look at the ExampleType.java as well as the other current types
>> and he really helped. I may write up a IP type definition for my own use as
>> well as submit it to you for inclusion in apache blur if that it is
>> desired. I know I probably won't be the only one to want that column type.
>
> That would great!  Thanks.
>
> Aaron
>
>
>>
>>
>> Thanks,
>> Colton McInroy
>>
>>   * Director of Security Engineering
>>
>>
>> Phone
>> (Toll Free)
>> _US_    (888)-818-1344 Press 2
>> _UK_    0-800-635-0551 Press 2
>>
>> My Extension    101
>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>> Website         http://www.dosarrest.com
>>
>> On 10/21/2013 6:10 AM, Aaron McCurry wrote:
>>
>>> The feature is not in 0.2.0 it is in 0.2.1 and 0.3.0.
>>>
>>> Here's the issue.
>>>
>>> https://issues.apache.org/**jira/browse/BLUR-258<https://issues.apache.org/jira/browse/BLUR-258>
>>>
>>> I haven't pushed a 0.2.1 website for documentation yet.  But the basics
>>> are
>>> create your type from FieldTypeDefinition or one of the other FTD classes
>>> by extending them.
>>>
>>> Then to use the custom type, you can either add your custom type to the
>>> entire cluster or per table.
>>>
>>> For Cluster Wide
>>>
>>> For cluster wide configuration you will need to add the new field types
>>> into the blur-site.properties file on each server.
>>>
>>> blur.fieldtype.customfield1=**org.apache.blur.analysis.type.**
>>> ExampleType1
>>> blur.fieldtype.customfield2=**org.apache.blur.analysis.type.**
>>> ExampleType2
>>> ...
>>>
>>> Please note that the prefix of "blur.fieldtype." is all that is used from
>>> the property name because the type gets it's name from the internal method
>>> of "getName". However the property names will need to be unique within the
>>> file.
>>>
>>> For Single Table
>>>
>>> For a single table configuration you will need to add the new field types
>>> into the tableProperties map in the TableDescriptor as you define the
>>> table.
>>>
>>> tableDescriptor.**putToTableProperties("blur.**fieldtype.customfield1",
>>>       "org.apache.blur.analysis.**type.ExampleType1");
>>> tableDescriptor.**putToTableProperties("blur.**fieldtype.customfield2",
>>>       "org.apache.blur.analysis.**type.ExampleType2");
>>> ...
>>>
>>> Please note that the prefix of "blur.fieldtype." is all that is used from
>>> the property name because the type gets it's name from the internal method
>>> of "getName". However the property names will need to be unique within the
>>> map.
>>>
>>> Aaron
>>>
>>>
>>>
>>> On Sun, Oct 20, 2013 at 10:59 PM, Colton McInroy <colton@dosarrest.com
>>>> wrote:
>>>   I noticed in the source the following column types are documented...
>>>>     /**
>>>>      * The field type for the column.  The built in types are:
>>>>      * <ul>
>>>>      * <li>text - Full text indexing.</li>
>>>>      * <li>string - Indexed string literal</li>
>>>>      * <li>int - Converted to an integer and indexed numerically.</li>
>>>>      * <li>long - Converted to an long and indexed numerically.</li>
>>>>      * <li>float - Converted to an float and indexed numerically.</li>
>>>>      * <li>double - Converted to an double and indexed numerically.</li>
>>>>      * <li>stored - Not indexed, only stored.</li>
>>>>      * </ul>
>>>>      */
>>>>
>>>> When I was looking at blur-query/src/main/java/org/***
>>>> *apache/blur/analysis/
>>>> **BaseFieldManager.java I came across this though...
>>>>
>>>> # grep addColumnDefinition blur-query/src/main/java/org/****
>>>> apache/blur/analysis/****BaseFieldManager.java
>>>>           addColumnDefinition(family, name, null,
>>>> getDefaultMissingFieldLessInde****xing(), getDefaultMissingFieldType(),
>>>>
>>>>     public boolean addColumnDefinition(String family, String columnName,
>>>> String subColumnName, boolean fieldLessIndexed,
>>>>     public void addColumnDefinitionGisPointVec****tor(String family,
>>>> String
>>>>
>>>> columnName) throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> SpatialPointVectorStrategyFiel****dTypeDefinition.NAME, null);
>>>>     public void addColumnDefinitionGisRecursiv****ePrefixTree(String
>>>> family,
>>>>
>>>> String columnName) throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> SpatialRecursivePrefixTreeStra****tegyFieldTypeDefinition.**NAME,
>>>>
>>>>     public void addColumnDefinitionDate(String family, String columnName,
>>>> String format) throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> DateFieldTypeDefinition.NAME, props);
>>>>     public void addColumnDefinitionInt(String family, String columnName)
>>>> throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> IntFieldTypeDefinition.NAME, null);
>>>>     public void addColumnDefinitionLong(String family, String columnName)
>>>> throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> LongFieldTypeDefinition.NAME, null);
>>>>     public void addColumnDefinitionFloat(****String family, String
>>>>
>>>> columnName) throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> FloatFieldTypeDefinition.NAME, null);
>>>>     public void addColumnDefinitionDouble(****String family, String
>>>>
>>>> columnName) throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> DoubleFieldTypeDefinition.****NAME, null);
>>>>     public void addColumnDefinitionString(****String family, String
>>>>
>>>> columnName) throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> StringFieldTypeDefinition.****NAME, null);
>>>>
>>>>     public void addColumnDefinitionText(String family, String columnName)
>>>> throws IOException {
>>>>       addColumnDefinition(family, columnName, null, false,
>>>> TextFieldTypeDefinition.NAME, null);
>>>>     public void addColumnDefinitionTextFieldLe****ss(String family,
>>>> String
>>>>
>>>> columnName) throws IOException {
>>>>       addColumnDefinition(family, columnName, null, true,
>>>> TextFieldTypeDefinition.NAME, null);
>>>>
>>>> I am wondering how to specify these. I would like to programmatically set
>>>> column types in certain situations, and I would like to be able to use
>>>> the
>>>> Date column type. Which I have been meaning to ask about....
>>>>
>>>> What is the best way to store a timestamp? What format, column type,
>>>> etc... I'm guessing the Date column type, but I do not know how to set it
>>>> right now. I noticed that the client (Iface object) has a
>>>> addColumnDefinition, but it has different parameters than the above
>>>> addColumnDefinition, and it's missing all of the ones for the different
>>>> column types.
>>>>
>>>> I have one additional field type I would like to see, which is one for IP
>>>> addresses...
>>>>
>>>>      * <li>date - Converted to a date and indexing.</li>
>>>>      * <li>text - Full text indexing.</li>
>>>>      * <li>string - Indexed string literal</li>
>>>>      * <li>int - Converted to an integer and indexed numerically.</li>
>>>>      * <li>long - Converted to an long and indexed numerically.</li>
>>>>      * <li>float - Converted to an float and indexed numerically.</li>
>>>>      * <li>double - Converted to an double and indexed numerically.</li>
>>>>      * <li>ip - Converted to a InetAddress and indexed numerically.</li>
>>>>
>>>> --
>>>> Thanks,
>>>> Colton McInroy
>>>>
>>>>    * Director of Security Engineering
>>>>
>>>>
>>>> Phone
>>>> (Toll Free)
>>>> _US_    (888)-818-1344 Press 2
>>>> _UK_    0-800-635-0551 Press 2
>>>>
>>>> My Extension    101
>>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>> Website         http://www.dosarrest.com
>>>>
>>>>
>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message