incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: [jira] [Closed] (BLUR-245) There is a deadlock condition that can occur during mutate batch calls.
Date Mon, 30 Sep 2013 21:46:08 GMT
On Sun, Sep 29, 2013 at 3:42 PM, Colton McInroy <colton@dosarrest.com>wrote:

> Comments inline...
>
>
> Thanks,
> Colton McInroy
>
>  * Director of Security Engineering
>
>
> Phone
> (Toll Free)
> _US_    (888)-818-1344 Press 2
> _UK_    0-800-635-0551 Press 2
>
> My Extension    101
> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> Website         http://www.dosarrest.com
>
> On 9/29/2013 12:24 PM, Aaron McCurry wrote:
>
>> On Sun, Sep 29, 2013 at 11:33 AM, Colton McInroy <colton@dosarrest.com
>> >wrote:
>>
>>  What about the family attribute?
>>>
>>>  Sorry about that forgot to add family info into my last response.  The
>> family attribute is basically a prefix to all the column names in the
>> record.  It was used in an early version of Blur to calculate Row Queries.
>>   But now there is an explicit way to call a Row query:
>>
>> http://incubator.apache.org/**blur/docs/0.2.0/data-model.**html#row_query<http://incubator.apache.org/blur/docs/0.2.0/data-model.html#row_query>
>>
>> Assuming that the renaming of core Blur objects occurs (Row and Record)
>> the
>> family attribute will likely be removed.  But for now it needs to be
>> populated.
>>
> Hmm... will the change be backwards compatibile? I am looking to implement
> this system as soon as I can, and I would prefer not to have to rebuild all
> of my code this close into the implementation. Any idea when these changes
> will occur? I don't mind using the latest git code.


If we change the API in 0.3.0 it will likely not be backwards compatible.
 But at this point it will mostly be renaming of objects.


>
>
>>  Ideally, I would just like to insert records into a table... I was
>>> thinking that I would create a table for each program that's getting it's
>>> logs indexed. I just had a though about this though. Perhaps I could
>>> create
>>> a table for a time period, like for a month, then use the program name as
>>> the rowid. That still leaves me with a recordid which I would prefer
>>> automatically have generated and I am not sure if it is. If it isn't
>>> uniquely generated, you suggest I use something like UUID.randomUUID()?
>>>
>>
>> UUID is fine but for the recordid is only has to be unique within the Row.
>>   So it could be anything.
>>
> Ok, so if a row is given a UUID, and that row contains 10000 records each
> with UUIDs as well, that should act as a bulk insert? Like this...
>
> |Iface client = BlurClient.getClient("**controller1:40010,controller2:**
> 40010");
>
> List  recordMutations = new ArrayList();
>
> for (Record rec : BufferedRecords) {
> |||     recordMutations.add(new RecordMutation(**
> RecordMutationType.REPLACE_**ENTIRE_RECORD, rec));|
> }
>
> RowMutation mutation = new RowMutation("PROGRAMNAME", UUID.randomUUID(),
> true, RowMutationType.REPLACE_ROW, recordMutations, false);
> mutation.setRecordMutations(**recordMutations);
>     client.mutate(mutation);|
>
> Or do you recommend something else?
>
> I am right at the point now where I can start inserting the records. I
> have the records being generated during the parsing of entries then the
> records are just buffered for now. The next step is to take those buffering
> log entries and put them into blur.


You could do that, I actually hadn't thought that use for a Row.  There is
an advantage to using mutate over mutateBatch.  Mutate is an atomic
operation and mutateBatch is not.  Just make sure that you don't use
rowQuery when doing searches.

http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_Query


>
>
>> Btw, I will likely be preposing the API changes soon.  I would love other
>> people to weigh in on them, I really don't want to change them but I think
>> it's the right thing to do.  Plus I don't want to have to rename them
>> again
>> after we go through the all the trouble.  So getting feedback is critical.
>>
> How soon is soon? I am currently in the process of trying to build
> software and a network for this, so if there is anything I can do to help
> give input, please let me know. I can also test anything related to this
> project that I am working on.
>

I know you saw the thread on the API change.

Aaron


>
>> Thanks,
>>
>> Aaron
>>
>>
>>
>>> Thanks,
>>> Colton McInroy
>>>
>>>   * Director of Security Engineering
>>>
>>>
>>> Phone
>>> (Toll Free)
>>> _US_    (888)-818-1344 Press 2
>>> _UK_    0-800-635-0551 Press 2
>>>
>>> My Extension    101
>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>> Website         http://www.dosarrest.com
>>>
>>> On 9/29/2013 7:29 AM, Aaron McCurry wrote:
>>>
>>>  On Sun, Sep 29, 2013 at 9:47 AM, Colton McInroy <colton@dosarrest.com
>>>>
>>>>> wrote:
>>>>>
>>>>   Glad to see you resolved this Aaron.
>>>>
>>>>> I am just in the process of building my parsing engine right now, so
I
>>>>> will make sure I update my build before I start doing the mutate calls.
>>>>>
>>>>> I have been reading the usage examples on mutate calls. I find it
>>>>> somewhat
>>>>> odd there is only mutate and no insert as well. I guess they are
>>>>> probably
>>>>> both treated the same. I am getting close to building the add record
>>>>> component to my parsing engine, but reading the code has left me
>>>>> somewhat
>>>>> puzzled. With lucene I treated each "Document" with various "Field"
>>>>> types,
>>>>> with Fields also being referenced as "Categories" for the facet
>>>>> indexing.
>>>>> Now with Blur it is much different. This mutate call seems to require
>>>>> three
>>>>> components which I am unsure of...
>>>>> The rowid is different from a recordid how?... and can I insert just
>>>>> rows
>>>>> with automatically generated ids? The data coming in won't have any
>>>>> unique
>>>>> id's associated with it, and with lucene in my previous experience you
>>>>> never needed to specify a recordid or rowid, it would automatically
>>>>> create
>>>>> a document id upon adding a new "Document" to the index.
>>>>> I am totaly clueless as to what the family attribute is for.
>>>>> I notice there are no column types. In my experience with Lucene you
>>>>> had
>>>>> to specify the "Field" types to integer, string, etc but I see no
>>>>> ability
>>>>> to do that in Blur. Is that handled automatically or something?
>>>>>
>>>>>   Ok, well you bring up some good points.  We have had some discussions
>>>>>
>>>> about
>>>> renaming the objects in Blur to be closer to Lucene.
>>>>
>>>> Records == Documents
>>>> Rows == Document Group
>>>> Column == Field
>>>>
>>>> The rowid is present for 2 purposes.
>>>>     1. The rowid uniquely identities the group of records
>>>>     2. The rowid is used to distribute the rows evenly across all the
>>>> shards
>>>> within the table.  It hashes the rowid and using the BlurPartitioner to
>>>> stored/index the row.
>>>>
>>>> The recordid is used to locate the record within the row so that single
>>>> records can be fetched without the entire row.
>>>>
>>>> If we go forward with the rename in 0.3.0 it will likely be something
>>>> like:
>>>>
>>>> Column => Field
>>>> Record => Document
>>>> Row =>DocumentGroup
>>>>
>>>> RecordId => DocId
>>>> RowId => DocGroupId
>>>>
>>>> Another change will be that Documents and DocumentGroups will be allowed
>>>> as
>>>> indexable units (instead of just Rows now).   However the DocId and
>>>> DocGroupId will likely still be required.  You could make the UUID's or
>>>> something like that.
>>>>
>>>> As far as the types, you will need to use the addColumnDefinition call:
>>>>
>>>> http://incubator.apache.org/****blur/docs/0.2.0/Blur.html#Fn_****<http://incubator.apache.org/**blur/docs/0.2.0/Blur.html#Fn_**>
>>>> Blur_addColumnDefinition<http:**//incubator.apache.org/blur/**
>>>> docs/0.2.0/Blur.html#Fn_Blur_**addColumnDefinition<http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Fn_Blur_addColumnDefinition>
>>>> >
>>>>
>>>>
>>>> And you can reference the types:
>>>>
>>>> http://incubator.apache.org/****blur/docs/0.2.0/data-model.****
>>>> html#types<http://incubator.apache.org/**blur/docs/0.2.0/data-model.**html#types>
>>>> <http://incubator.**apache.org/blur/docs/0.2.0/**data-model.html#types<http://incubator.apache.org/blur/docs/0.2.0/data-model.html#types>
>>>> >
>>>>
>>>>
>>>> Hope this helps, I know it's a bit clumsy but we have plans to improve.
>>>>
>>>> Thanks,
>>>> Aaron
>>>>
>>>>
>>>>   Thanks,
>>>>
>>>>> Colton McInroy
>>>>>
>>>>>    * Director of Security Engineering
>>>>>
>>>>>
>>>>> Phone
>>>>> (Toll Free)
>>>>> _US_    (888)-818-1344 Press 2
>>>>> _UK_    0-800-635-0551 Press 2
>>>>>
>>>>> My Extension    101
>>>>> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>>>>> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>>>>> Website         http://www.dosarrest.com
>>>>>
>>>>>
>>>>> On 9/29/2013 6:20 AM, Aaron McCurry (JIRA) wrote:
>>>>>
>>>>>          [ https://issues.apache.org/******
>>>>> jira/browse/BLUR-245?page=com.****<https://issues.apache.org/****jira/browse/BLUR-245?page=com.**>
>>>>>
>>>>>> ** <https://issues.apache.org/****jira/browse/BLUR-245?page=com.****<https://issues.apache.org/**jira/browse/BLUR-245?page=com.**>
>>>>>> >
>>>>>> atlassian.jira.plugin.system.******issuetabpanels:all-**tabpanel<h**
>>>>>> ttps://issues.apache.org/jira/****browse/BLUR-245?page=com.**<http://issues.apache.org/jira/**browse/BLUR-245?page=com.**>
>>>>>>
>>>>>> atlassian.jira.plugin.system.****issuetabpanels:all-tabpanel<h**
>>>>>> ttps://issues.apache.org/jira/**browse/BLUR-245?page=com.**
>>>>>> atlassian.jira.plugin.system.**issuetabpanels:all-tabpanel<https://issues.apache.org/jira/browse/BLUR-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
>>>>>> >
>>>>>>
>>>>>>> ]
>>>>>>>
>>>>>>
>>>>>> Aaron McCurry closed BLUR-245.
>>>>>> ------------------------------
>>>>>>
>>>>>>        Resolution: Fixed
>>>>>>
>>>>>> https://git-wip-us.apache.org/******repos/asf?p=incubator-**blur.****<https://git-wip-us.apache.org/****repos/asf?p=incubator-blur.****>
>>>>>> <https://git-wip-us.**apache.org/**repos/asf?p=**incubator-blur.**<https://git-wip-us.apache.org/**repos/asf?p=incubator-blur.**>
>>>>>> >
>>>>>> git;a=commit;h=******6b000703457e64d5c9334426ed012c******027a359eb3<
>>>>>> https://git-wip-**us.apache.**org/repos/asf?p=**<http://us.apache.org/repos/asf?p=**>
>>>>>> incubator-blur.git;a=commit;h=******6b000703457e64d5c9334426ed012c**
>>>>>> **
>>>>>>
>>>>>> 027a359eb3<https://git-wip-us.**apache.org/repos/asf?p=**
>>>>>> incubator-blur.git;a=commit;h=**6b000703457e64d5c9334426ed012c**
>>>>>> 027a359eb3<https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=commit;h=6b000703457e64d5c9334426ed012c027a359eb3>
>>>>>> >
>>>>>> https://git-wip-us.apache.org/******repos/asf?p=incubator-**blur.****<https://git-wip-us.apache.org/****repos/asf?p=incubator-blur.****>
>>>>>> <https://git-wip-us.**apache.org/**repos/asf?p=**incubator-blur.**<https://git-wip-us.apache.org/**repos/asf?p=incubator-blur.**>
>>>>>> >
>>>>>> git;a=commit;h=******ffc817c4401ce53b6ba1b0fed70026******0d34c8acac<
>>>>>> https://git-wip-**us.apache.**org/repos/asf?p=**<http://us.apache.org/repos/asf?p=**>
>>>>>> incubator-blur.git;a=commit;h=******ffc817c4401ce53b6ba1b0fed70026**
>>>>>> **
>>>>>>
>>>>>> 0d34c8acac<https://git-wip-us.**apache.org/repos/asf?p=**
>>>>>> incubator-blur.git;a=commit;h=**ffc817c4401ce53b6ba1b0fed70026**
>>>>>> 0d34c8acac<https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=commit;h=ffc817c4401ce53b6ba1b0fed700260d34c8acac>
>>>>>> >
>>>>>>
>>>>>>    There is a deadlock condition that can occur during mutate batch
>>>>>> calls.
>>>>>>
>>>>>>  ------------------------------******--------------------------**
>>>>>>> --**--**
>>>>>>> -----------
>>>>>>>
>>>>>>>                    Key: BLUR-245
>>>>>>>                    URL: https://issues.apache.org/****
>>>>>>> jira/browse/BLUR-245<https://**issues.apache.org/**jira/**
>>>>>>> browse/BLUR-245 <https://issues.apache.org/**jira/browse/BLUR-245>>
>>>>>>> <https://**issues.apache.org/**jira/browse/**BLUR-245<http://issues.apache.org/jira/browse/**BLUR-245>
>>>>>>> <https:/**/issues.apache.org/jira/**browse/BLUR-245<https://issues.apache.org/jira/browse/BLUR-245>
>>>>>>> >
>>>>>>>
>>>>>>>                Project: Apache Blur
>>>>>>>             Issue Type: Bug
>>>>>>>             Components: Blur
>>>>>>>       Affects Versions: 0.3.0, 0.2.1
>>>>>>>               Reporter: Aaron McCurry
>>>>>>>               Priority: Blocker
>>>>>>>                Fix For: 0.3.0, 0.2.1
>>>>>>>
>>>>>>>
>>>>>>> Basically there is a thread pool that the mutates use for performing
>>>>>>> the
>>>>>>> mutate.  However the batch mutate call in the index manager submits
a
>>>>>>> job
>>>>>>> then in that submitted job it creates more jobs (one for each
shard).
>>>>>>>   This
>>>>>>> can cause a deadlock condition in the thread pool, because the
thread
>>>>>>> pool
>>>>>>> is a fixed size.
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>> This message was sent by Atlassian JIRA
>>>>>> (v6.1#6144)
>>>>>>
>>>>>>
>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message