camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vcheruvu <vid.cher...@macquarie.com>
Subject Re: Performance - Camel JPA
Date Thu, 04 Feb 2010 02:26:07 GMT

We needed near realtime to extract data from old table and persist in new
table in different database for downstream processing. So we are using camel
and java as solution to get something going for now. 

Yes using direct jdbc is ideal. However I still have to map rows to an
object. I thought i would be writing lousy mapping and make mistakes that
JPA contributors have experienced. So, Why re-invent the wheel, I am using
hibernate JPA (ORM) which use optimal and best practice to map each row's
fields to object field.  Loading 40000 entities is not the issue. I came to
know that the issue is with  inserting transformed entity took too long.
This is because, in camel route  config  as shown below,

<route>
		<from uri="jpa:com.OldEntity?consumer.query=select x from OldEntity x
where
x.processed=0&amp;maximumResults=1000&amp;consumeDelete=false&amp;delay=3000&amp;consumeLockEntity=false@amp;consumer.fixedDelay=true"/>
		< to uri="bean:transformerBean?method=transformOrder"/>
		<convertBodyTo type="com.NewEventEntity"/>
		<to uri="jpa:com.NewEventEntity"/>
	</route>

Each entity that was loaded by jpaconsumer has been channeled to
transformation and persist transformed entity by jpaproducer. This is single
thread and waits for all 1000 to complete and then batch update commit are
made for old table to mark flag as committed. This is basically making
JPAConsumer to wait till all the  1000 Entity processing are completed and
then polls for next 1000 entities. Another issue is that I only used 1
database connection.   I thought I could increase the speed for inserting
new entity.


So I have split the original route, check below for modified version

<route>
<!-- this route is about getting 1000 entities and tranform them to
newEntity -->
		<from uri="jpa:com.OldEntity?consumer.query=select x from com.oldEntity x
where
x.processed=0&amp;maximumResults=1&amp;consumeDelete=false&amp;delay=3000&amp;consumeLockEntity=false@amp;consumer.fixedDelay=true"/>
		<to uri="bean:transformerBean?method=transformOrder"/>
<!--  call another route, essential sent it to  queue for further processing
-->
		<to
uri="vm:storeNewEntity?size=10000&amp;timeout=1000000&amp;concurrentConsumers=100"></to>
</route>


 <route>
<!-- queue size is 10000 and there are 100 threads that work off the queue
to insert new entities. -->
    	<from
uri="vm:storeNewEntity?size=10000&amp;timeout=1000000&amp;concurrentConsumers=100"/>
    	<to uri="jpa:com.mbl.entity.NewEventEntity"/>
    </route>

I have also added c3p0 for database connection pool in persistence.xml

	  <property name="hibernate.c3p0.min_size" value="10"/>
      <property name="hibernate.c3p0.max_size" value="100"/>
      <property name="hibernate.c3p0.timeout" value="60"/>
      <property name="hibernate.c3p0.max_statements" value="50"/>
      <property name="hibernate.c3p0.idle_test_period" value="10000"/>

I only had to make config change and it significantly improved the
performance. I could complete 40,000 entities in less than 5 mins. So, per
second it can process 133 records.   I believe there is still room for
improvement.  Instead of using JPAProducer,  I should call store proc to
insert into new table. 


My conclusion, JPAConsumer and translation is fine, problem was with JPA
insert.


Claus Ibsen-2 wrote:
> 
> On Tue, Feb 2, 2010 at 6:30 AM, Kevin Jackson <foamdino@gmail.com> wrote:
>> Hi,
>> [snip]
>>
>>> I have ensured that index are put in place for old table and new table.
>>> There is no need of second level cache in this scenario. I have used
>>> UUID to
>>> generate unique key when inserting new record. Yet this apps take 30
>>> mins
>>> for 40,000.
>>
>> Indexes on the new table are going to hurt your insert performance.
>> For large data loads, have you tried:
>> 1 - push data into a table with no ref integrity (a load table) and no
>> indexes
>> 2 - asynchronously (after all the data has been loaded into the load
>> table), call a stored procedure that copies the data from load to the
>> real table
>> 3 - after store proc has run, truncate the load table
>>
>> Kev
>>
> 
> Yeah I do not think JPA fits well with ETL kinda work.
> http://en.wikipedia.org/wiki/Extract,_transform,_load
> 
> There is a zillion other ways to load a lot of data into a database,
> and using an ORM will newer be very fast.
> 
> Try googling a bit with your database name and ETL etc. And/or talk to
> DB specialists in your organization.
> 
> If you need to do hand crafted SQL queries you may want to use Spring
> JDBC or iBatis etc. Sometimes its just easier to use Spring JDBC as
> its a little handy library.
> 
> -- 
> Claus Ibsen
> Apache Camel Committer
> 
> Author of Camel in Action: http://www.manning.com/ibsen/
> Open Source Integration: http://fusesource.com
> Blog: http://davsclaus.blogspot.com/
> Twitter: http://twitter.com/davsclaus
> 
> 

-- 
View this message in context: http://old.nabble.com/Performance---Camel-JPA-tp27412920p27446740.html
Sent from the Camel - Users mailing list archive at Nabble.com.


Mime
View raw message