camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claus Ibsen <claus.ib...@gmail.com>
Subject Re: Performance - Camel JPA
Date Thu, 04 Feb 2010 05:39:29 GMT
Hi


On Thu, Feb 4, 2010 at 3:26 AM, vcheruvu <vid.cheruvu@macquarie.com> wrote:
>
> We needed near realtime to extract data from old table and persist in new
> table in different database for downstream processing. So we are using camel
> and java as solution to get something going for now.
>
> Yes using direct jdbc is ideal. However I still have to map rows to an
> object. I thought i would be writing lousy mapping and make mistakes that
> JPA contributors have experienced. So, Why re-invent the wheel, I am using
> hibernate JPA (ORM) which use optimal and best practice to map each row's
> fields to object field.  Loading 40000 entities is not the issue. I came to
> know that the issue is with  inserting transformed entity took too long.
> This is because, in camel route  config  as shown below,

:) ORM is not ideal for ETL work. When you talk about 40000 rows that
is hardly anything.
Try millions or even more. Then you may have to use a different
strategy than using Java and an ORM.

Hibernate et all are optimized for applications build on top of the
database. Not for bulk loading millions of rows into tables.
But you have done you due diligence and if 5 min to load 40000 rows
meets your demand that is fine. And if other engineers in your team
can understand and maintain the code you wrote that is great.




>
> <route>
>                <from uri="jpa:com.OldEntity?consumer.query=select x from OldEntity
x
> where
> x.processed=0&amp;maximumResults=1000&amp;consumeDelete=false&amp;delay=3000&amp;consumeLockEntity=false@amp;consumer.fixedDelay=true"/>
>                < to uri="bean:transformerBean?method=transformOrder"/>
>                <convertBodyTo type="com.NewEventEntity"/>
>                <to uri="jpa:com.NewEventEntity"/>
>        </route>
>
> Each entity that was loaded by jpaconsumer has been channeled to
> transformation and persist transformed entity by jpaproducer. This is single
> thread and waits for all 1000 to complete and then batch update commit are
> made for old table to mark flag as committed. This is basically making
> JPAConsumer to wait till all the  1000 Entity processing are completed and
> then polls for next 1000 entities. Another issue is that I only used 1
> database connection.   I thought I could increase the speed for inserting
> new entity.
>
>
> So I have split the original route, check below for modified version
>
> <route>
> <!-- this route is about getting 1000 entities and tranform them to
> newEntity -->
>                <from uri="jpa:com.OldEntity?consumer.query=select x from com.oldEntity
x
> where
> x.processed=0&amp;maximumResults=1&amp;consumeDelete=false&amp;delay=3000&amp;consumeLockEntity=false@amp;consumer.fixedDelay=true"/>
>                <to uri="bean:transformerBean?method=transformOrder"/>
> <!--  call another route, essential sent it to  queue for further processing
> -->
>                <to
> uri="vm:storeNewEntity?size=10000&amp;timeout=1000000&amp;concurrentConsumers=100"></to>
> </route>
>
>
>  <route>
> <!-- queue size is 10000 and there are 100 threads that work off the queue
> to insert new entities. -->
>        <from
> uri="vm:storeNewEntity?size=10000&amp;timeout=1000000&amp;concurrentConsumers=100"/>
>        <to uri="jpa:com.mbl.entity.NewEventEntity"/>
>    </route>
>
> I have also added c3p0 for database connection pool in persistence.xml
>
>          <property name="hibernate.c3p0.min_size" value="10"/>
>      <property name="hibernate.c3p0.max_size" value="100"/>
>      <property name="hibernate.c3p0.timeout" value="60"/>
>      <property name="hibernate.c3p0.max_statements" value="50"/>
>      <property name="hibernate.c3p0.idle_test_period" value="10000"/>
>
> I only had to make config change and it significantly improved the
> performance. I could complete 40,000 entities in less than 5 mins. So, per
> second it can process 133 records.   I believe there is still room for
> improvement.  Instead of using JPAProducer,  I should call store proc to
> insert into new table.
>
>
> My conclusion, JPAConsumer and translation is fine, problem was with JPA
> insert.
>

Good you got a solution you like.



>
> Claus Ibsen-2 wrote:
>>
>> On Tue, Feb 2, 2010 at 6:30 AM, Kevin Jackson <foamdino@gmail.com> wrote:
>>> Hi,
>>> [snip]
>>>
>>>> I have ensured that index are put in place for old table and new table.
>>>> There is no need of second level cache in this scenario. I have used
>>>> UUID to
>>>> generate unique key when inserting new record. Yet this apps take 30
>>>> mins
>>>> for 40,000.
>>>
>>> Indexes on the new table are going to hurt your insert performance.
>>> For large data loads, have you tried:
>>> 1 - push data into a table with no ref integrity (a load table) and no
>>> indexes
>>> 2 - asynchronously (after all the data has been loaded into the load
>>> table), call a stored procedure that copies the data from load to the
>>> real table
>>> 3 - after store proc has run, truncate the load table
>>>
>>> Kev
>>>
>>
>> Yeah I do not think JPA fits well with ETL kinda work.
>> http://en.wikipedia.org/wiki/Extract,_transform,_load
>>
>> There is a zillion other ways to load a lot of data into a database,
>> and using an ORM will newer be very fast.
>>
>> Try googling a bit with your database name and ETL etc. And/or talk to
>> DB specialists in your organization.
>>
>> If you need to do hand crafted SQL queries you may want to use Spring
>> JDBC or iBatis etc. Sometimes its just easier to use Spring JDBC as
>> its a little handy library.
>>
>> --
>> Claus Ibsen
>> Apache Camel Committer
>>
>> Author of Camel in Action: http://www.manning.com/ibsen/
>> Open Source Integration: http://fusesource.com
>> Blog: http://davsclaus.blogspot.com/
>> Twitter: http://twitter.com/davsclaus
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Performance---Camel-JPA-tp27412920p27446740.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Mime
View raw message