incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colton McInroy <col...@dosarrest.com>
Subject Re: [jira] [Closed] (BLUR-245) There is a deadlock condition that can occur during mutate batch calls.
Date Sun, 29 Sep 2013 15:33:55 GMT
What about the family attribute?

So, does that mean rowid and recordid have to be manually generated?

Ideally, I would just like to insert records into a table... I was 
thinking that I would create a table for each program that's getting 
it's logs indexed. I just had a though about this though. Perhaps I 
could create a table for a time period, like for a month, then use the 
program name as the rowid. That still leaves me with a recordid which I 
would prefer automatically have generated and I am not sure if it is. If 
it isn't uniquely generated, you suggest I use something like 
UUID.randomUUID()?

Thanks,
Colton McInroy

  * Director of Security Engineering

	
Phone
(Toll Free) 	
_US_ 	(888)-818-1344 Press 2
_UK_ 	0-800-635-0551 Press 2

My Extension 	101
24/7 Support 	support@dosarrest.com <mailto:support@dosarrest.com>
Email 	colton@dosarrest.com <mailto:colton@dosarrest.com>
Website 	http://www.dosarrest.com

On 9/29/2013 7:29 AM, Aaron McCurry wrote:
> On Sun, Sep 29, 2013 at 9:47 AM, Colton McInroy <colton@dosarrest.com>wrote:
>
>> Glad to see you resolved this Aaron.
>>
>> I am just in the process of building my parsing engine right now, so I
>> will make sure I update my build before I start doing the mutate calls.
>>
>> I have been reading the usage examples on mutate calls. I find it somewhat
>> odd there is only mutate and no insert as well. I guess they are probably
>> both treated the same. I am getting close to building the add record
>> component to my parsing engine, but reading the code has left me somewhat
>> puzzled. With lucene I treated each "Document" with various "Field" types,
>> with Fields also being referenced as "Categories" for the facet indexing.
>> Now with Blur it is much different. This mutate call seems to require three
>> components which I am unsure of...
>> The rowid is different from a recordid how?... and can I insert just rows
>> with automatically generated ids? The data coming in won't have any unique
>> id's associated with it, and with lucene in my previous experience you
>> never needed to specify a recordid or rowid, it would automatically create
>> a document id upon adding a new "Document" to the index.
>> I am totaly clueless as to what the family attribute is for.
>> I notice there are no column types. In my experience with Lucene you had
>> to specify the "Field" types to integer, string, etc but I see no ability
>> to do that in Blur. Is that handled automatically or something?
>>
> Ok, well you bring up some good points.  We have had some discussions about
> renaming the objects in Blur to be closer to Lucene.
>
> Records == Documents
> Rows == Document Group
> Column == Field
>
> The rowid is present for 2 purposes.
>    1. The rowid uniquely identities the group of records
>    2. The rowid is used to distribute the rows evenly across all the shards
> within the table.  It hashes the rowid and using the BlurPartitioner to
> stored/index the row.
>
> The recordid is used to locate the record within the row so that single
> records can be fetched without the entire row.
>
> If we go forward with the rename in 0.3.0 it will likely be something like:
>
> Column => Field
> Record => Document
> Row =>DocumentGroup
>
> RecordId => DocId
> RowId => DocGroupId
>
> Another change will be that Documents and DocumentGroups will be allowed as
> indexable units (instead of just Rows now).   However the DocId and
> DocGroupId will likely still be required.  You could make the UUID's or
> something like that.
>
> As far as the types, you will need to use the addColumnDefinition call:
>
> http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Fn_Blur_addColumnDefinition
>
> And you can reference the types:
>
> http://incubator.apache.org/blur/docs/0.2.0/data-model.html#types
>
> Hope this helps, I know it's a bit clumsy but we have plans to improve.
>
> Thanks,
> Aaron
>
>
>> Thanks,
>> Colton McInroy
>>
>>   * Director of Security Engineering
>>
>>
>> Phone
>> (Toll Free)
>> _US_    (888)-818-1344 Press 2
>> _UK_    0-800-635-0551 Press 2
>>
>> My Extension    101
>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>> Website         http://www.dosarrest.com
>>
>>
>> On 9/29/2013 6:20 AM, Aaron McCurry (JIRA) wrote:
>>
>>>        [ https://issues.apache.org/**jira/browse/BLUR-245?page=com.**
>>> atlassian.jira.plugin.system.**issuetabpanels:all-tabpanel<https://issues.apache.org/jira/browse/BLUR-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>]
>>>
>>> Aaron McCurry closed BLUR-245.
>>> ------------------------------
>>>
>>>       Resolution: Fixed
>>>
>>> https://git-wip-us.apache.org/**repos/asf?p=incubator-blur.**
>>> git;a=commit;h=**6b000703457e64d5c9334426ed012c**027a359eb3<https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=commit;h=6b000703457e64d5c9334426ed012c027a359eb3>
>>>
>>> https://git-wip-us.apache.org/**repos/asf?p=incubator-blur.**
>>> git;a=commit;h=**ffc817c4401ce53b6ba1b0fed70026**0d34c8acac<https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=commit;h=ffc817c4401ce53b6ba1b0fed700260d34c8acac>
>>>
>>>   There is a deadlock condition that can occur during mutate batch calls.
>>>> ------------------------------**------------------------------**
>>>> -----------
>>>>
>>>>                   Key: BLUR-245
>>>>                   URL: https://issues.apache.org/**jira/browse/BLUR-245<https://issues.apache.org/jira/browse/BLUR-245>
>>>>               Project: Apache Blur
>>>>            Issue Type: Bug
>>>>            Components: Blur
>>>>      Affects Versions: 0.3.0, 0.2.1
>>>>              Reporter: Aaron McCurry
>>>>              Priority: Blocker
>>>>               Fix For: 0.3.0, 0.2.1
>>>>
>>>>
>>>> Basically there is a thread pool that the mutates use for performing the
>>>> mutate.  However the batch mutate call in the index manager submits a job
>>>> then in that submitted job it creates more jobs (one for each shard).  This
>>>> can cause a deadlock condition in the thread pool, because the thread pool
>>>> is a fixed size.
>>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.1#6144)
>>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message