asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Searching for duplicates during feed ingestion.
Date Mon, 08 May 2017 18:05:35 GMT
+0.99 from me.


On 5/8/17 9:50 AM, Taewoo Kim wrote:
> +1 for auto-generated ID case
>
> Best,
> Taewoo
>
> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu <buyingyi@gmail.com> wrote:
>
>> Abdullah has a pending change that disables searches if there's no
>> secondary indexes [1].
>> Auto-generated ID could be another case for which we can disable searches
>> as well.
>>
>> Best,
>> Yingyi
>>
>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/
>>
>>
>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet <wael.y.k@gmail.com>
>> wrote:
>>
>>> Hi Devs,
>>>
>>> I'm noticing a behavior during the ingestion is that it's getting slower
>> by
>>> time. I know that is an expected behavior in LSM-indexes. But what I'm
>>> seeing is that I can notice the drop in ingestion rate roughly after
>> having
>>> 10 components (around ~13 GB). That's what I'm not sure if it's expected?
>>>
>>> I tried multiple setups (increasing Memory component size +
>>> max-mergable-component-size). All of which delayed the problem but not
>>> solved it. The only part I've never changed is the bloom-filter
>>> false-positive rate (1%). Which I want to investigate next.
>>>
>>> So..
>>> What I want to suggest is that when the primary key is auto-generated,
>> why
>>> AsterixDB looks for duplicates? it seems a wasteful operation to me.
>> Also,
>>> can we give the user the ability to tell the index that all keys are
>> unique
>>> ? I know I should not trust the user .. but in certain cases, probably
>> the
>>> user is certain that the key is unique. Or a more elegant solution can
>>> shine in the end :-)
>>>
>>> --
>>>
>>> *Regards,*
>>> Wail Alkowaileet
>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message