ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Tanackovic <igor.tanacko...@gmail.com>
Subject Re: SQL Engine
Date Thu, 18 Oct 2018 09:38:50 GMT
Cool… We already have a workaround for our issue… I just wanted to make sure we are not
doing something terribly wrong.


Regards,
Igor

> On Oct 18, 2018, at 11:07 AM, Vladimir Ozerov <vozerov@gridgain.com> wrote:
> 
> Hi Igor,
> 
> What Ignite version do you use? Recent versions do not read real rows for
> the simplest "SELECT COUNT(*) FROM table" query. Instead it goes over index
> and simply count records, so no copying happens. But let me express again -
> this is not the case in general. If any expression appears, we have to read
> the row from offheap. In this case the whole key-value pair is read from
> offheap including non-SQL fields. This is so because we do not know in
> advance what parts of key or value will be needed. Even if specific field
> is not marked as @QuerySqlField, it may be needed if the whole key or value
> is requested ("SELECT _KEY FROM table"), or if it is used in some custom
> function, etc..
> 
> In future versions we may re-approach this, and try to read only fields
> required for query from offheap. But this will require significant changes
> to both storage layer and query engine, so this is not an easy task.
> 
> 
> On Thu, Oct 18, 2018 at 2:42 AM Igor Tanackovic <igor.tanackovic@gmail.com>
> wrote:
> 
>> Dmitriy,
>> 
>> Correct me if I’m wrong, but the concept is to store everything off heap -
>> which is perfectly fine :). So, the question is how SQL engine actually
>> works. Analyzing profiler object generation and sizes on heap, I’ve learned
>> that for each query entire rows in question are copied on heap which, I’d
>> say is a little bit inefficient (more or less depending on domain object).
>> Maybe an example will help to further elaborate on the issue.
>> 
>> Lets say we have an object ORGANIZATION which has only two query enabled
>> fields (annotated with @QuerySqlField) and 20 more value fields
>> (text/blobs, …) which are not used in queries (not annotated with
>> @QuerySqlField and thus not visible by SQL engine). So, how COUNT works?
>> Will SQL engine copy on heap just a part of ORGANIZATION rows (2 fields
>> that might be part of COUNT query) or entire rows (sql enabled fields +
>> text/blob fields…)?
>> 
>> From what I saw during profiling, I’d say entire rows will be copied on
>> heap. Just to compare, this is our, real world example:
>> 
>> Object A (~300k rows, each row has 5 columns, all sql enabled), COUNT(*)
>> takes ~2MB on heap
>> Object B (~25k rows, each row has 16 columns, 5 sql enabled fields and 11
>> text value fields), COUNT(*) takes ~60MB on heap - nearly 30 times more
>> 
>> 
>> Regards,
>> Igor
>> 
>> 
>> 
>>> On Oct 18, 2018, at 12:24 AM, Dmitriy Setrakyan <dsetrakyan@apache.org>
>> wrote:
>>> 
>>> I do not understand - why are we copying values on-heap? If we must copy
>>> something, why not copy rows into some off-heap space and read what we
>> have
>>> from there? At least this way we will not burden the JVM with extra GC
>>> pressure.
>>> 
>>> D.
>>> 
>>> On Wed, Oct 17, 2018 at 10:33 AM Igor Tanackovic <
>> igor.tanackovic@gmail.com>
>>> wrote:
>>> 
>>>> Vladimir,
>>>> 
>>>> Thanks for explanation... that’s true - ignite never deserialize values
>>>> during query execution. The point here is why copying fields to heap
>> that
>>>> sql engine could not use (not annotated as QuerySqlField)?
>>>> SQL count is a perfect example where you can benefit (heap space and gc)
>>>> copying only field sql engine could use.
>>>> 
>>>> Regards,
>>>> Igor
>>>> 
>>>> On Wed, Oct 17, 2018 at 18:47 Vladimir Ozerov <vozerov@gridgain.com>
>>>> wrote:
>>>> 
>>>>> Hi Igor,
>>>>> 
>>>>> We never deserialize values during query execution. Instead, we copy
>> the
>>>>> row to heap and extract fields as needed. In general case it is
>>>> impossible
>>>>> to avoid reading the whole row because we do not know whether this is
>>>>> COUNT(*) or COUNT(*) WHERE <something> or COUNT(<something>)
WHERE
>>>>> <something>.
>>>>> We do handle plain COUNT(*) as speical case and iterate over index
>> only,
>>>>> but event this simple query cannot avoid row reads in general case when
>>>> new
>>>>> snapshot mode is enabled, because an entry in the index may be not
>>>> visible
>>>>> to current transaction.
>>>>> 
>>>>> On Wed, Oct 17, 2018 at 7:19 PM igor.tanackovic <
>>>> igor.tanackovic@gmail.com
>>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Seems that SQL engine always deserialize whole objects instead of
>> using
>>>>>> just
>>>>>> SQL enabled fields (annotated with @QuerySqlField). This may have
a
>>>> huge
>>>>>> impact on Ignite heap usage and GC overhead as well.
>>>>>> 
>>>>>> For example, we have a cache holding big objects but with only two
sql
>>>>>> query
>>>>>> fields which for each query execution (SELECT COUNT(*) FROM 'cache')
>>>>>> consumes large amount on heap memory (~300MB). As a proof of concept,
>>>> we
>>>>>> divided the same cache to *index* cache with only sql query field
and
>> a
>>>>>> *data* holding whole object for materialization. The same query
>> (SELECT
>>>>>> COUNT(*) FROM 'index-cache') consumes ~25 time less memory! The same
>> is
>>>>>> true
>>>>>> for all other queries.
>>>>>> 
>>>>>> The obvious workaround would be to always have separated regions
for
>>>>>> indexes
>>>>>> (sql query enabled region) and a data/value region for
>> materialization,
>>>>> but
>>>>>> it might be a good idea to fix this in a systematic way during off
>> heap
>>>>>> deserialization.
>>>>>> 
>>>>>> Regards,
>>>>>> Igor
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>>>>>> 
>>>>> 
>>>> 
>> 
>> 


Mime
View raw message