asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wail Alkowaileet <>
Subject Searching for duplicates during feed ingestion.
Date Mon, 08 May 2017 11:30:01 GMT
Hi Devs,

I'm noticing a behavior during the ingestion is that it's getting slower by
time. I know that is an expected behavior in LSM-indexes. But what I'm
seeing is that I can notice the drop in ingestion rate roughly after having
10 components (around ~13 GB). That's what I'm not sure if it's expected?

I tried multiple setups (increasing Memory component size +
max-mergable-component-size). All of which delayed the problem but not
solved it. The only part I've never changed is the bloom-filter
false-positive rate (1%). Which I want to investigate next.

What I want to suggest is that when the primary key is auto-generated, why
AsterixDB looks for duplicates? it seems a wasteful operation to me. Also,
can we give the user the ability to tell the index that all keys are unique
? I know I should not trust the user .. but in certain cases, probably the
user is certain that the key is unique. Or a more elegant solution can
shine in the end :-)


Wail Alkowaileet

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message