carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bhavya411 <...@git.apache.org>
Subject [GitHub] carbondata issue #1538: [CARBONDATA-1779] GenericVectorizedReader
Date Wed, 22 Nov 2017 08:55:11 GMT
Github user bhavya411 commented on the issue:

    https://github.com/apache/carbondata/pull/1538
  
    This PR removes the Spark Dependency from Presto Integration Module for using the CarbonVectorizedRecordreader,
This PR consolidate  CarbonVectorizedRecordReader into one,to make it shared for all integration
modules
    
    In the earlier version of Presto Integration we were using ColumnarBatch of Spark, which
is not a good practice, here we provided our own implementation of the ColumnVector and the
VectorBatch to eliminate the Spark all together. This generic ColumnVector can now be used
for all the integration module wherever we want to have a VectorizedReader to speed up the
processing. 
    
    There are some core module classes changed to ensure that we are using Java data types
instead of Spark datatypes, Decimal being one of them.
    
    This PR tries to limit the changes to Core module .
    
    Newly Added Classes
    1.CarbonColumnVectorImpl:This Class Implements the Interface CarbonColumnVector and provides
the methods to store the data in a Vector and to retrieved the data from it as well
    
    2.CarbonVectorBatch: This Class Creates A VectorizedRowBatch which is a set of rows, organized
with each column as a CarbonVector. It is the unit of query execution, organized to minimize
the cost per row and achieve high cycles-per-instruction. The major fields are public by design
to allow fast and convenient access by the vectorized query execution code.
    
    3.StructField:This class is used to pass the Schema Information to the Carbon Columnar
Batch



---

Mime
View raw message