ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferry Syafei Sapei <ferry.sa...@googlemail.com>
Subject Re: Grouping cache when loading data using CacheStore
Date Wed, 20 Jan 2016 08:54:47 GMT
Importing the CSV into H2 database will require a huge amount of memory, since the file is
big and contains a lot of redundant data. Some rows should be aggregated since they belong
to an object with the same key (e.g. accountNumber). Moreover the rows are not sorted by the
accountNumber. 

Could you propose another solution instead of using the H2 database?

I have tried storing the CSV in igfs and performing the map-reduce. I instantiate an object
for each row in the the job, but in the reduce method in the task, I get all the instantiated
objects. They are not grouped by the accountNumber. Is there a way to get a grouped object
in the reduce method?

 
> Am 20.01.2016 um 07:21 schrieb Alexey Kuznetsov <akuznetsov@gridgain.com>:
> 
> Ferry,
> 
> I would like to propose following work around: 
> 1) Import your CSV into H2 database, see: http://www.h2database.com/html/tutorial.html#csv
<http://www.h2database.com/html/tutorial.html#csv>
> 2) Use Apache Ignite Schema Import Utility to generate POJO classes and xml/java configuration,\
> see https://apacheignite.readme.io/docs/automatic-persistence <https://apacheignite.readme.io/docs/automatic-persistence>
> 3) Use CacheJdbcPojoStoreFactory / CacheJdbcPojoStore to load your data into cache.
> 
> Will this work for you?
> 
> 
> On Tue, Jan 19, 2016 at 10:02 PM, Ferry Syafei Sapei <ferry.sapei@googlemail.com <mailto:ferry.sapei@googlemail.com>>
wrote:
> I have a CSV file with the following structure:
> 
> accountNumber,accountProperty1,accountProperty2,billNumber,billProperty1,billProperty2
> 100,property11,property12,100700,billProperty11,billProperty12
> 100,property11,property12,100700,billProperty21,billProperty22
> 
> I would like to import the file and fill in the cache with the following object structure:
> class AccountInformation
>         int accountNumber
>         String accountProperty1
>         String accountProperty2
>         List<Bill> bills
> 
> class Bill
>         int billNumber
>         String billProperty1
>         String billProperty2
> 
> I have tried using IgniteDataStreamer and StreamVisitor. Line by line will be read and
added to the data stream. In the data streamer, I could check if the account information exists
or not. If it exists, I just add the new bill to the existing account and replace the cache
content for that account.
> 
> How can I achieve the same result using CacheStore?     
> 
> 
> 
> -- 
> Alexey Kuznetsov
> GridGain Systems
> www.gridgain.com <http://www.gridgain.com/>


Mime
View raw message