db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hcadavid <hectorcada...@yahoo.com>
Subject Re: Derby problem: 13GB of space with 200000 records!
Date Fri, 12 Sep 2008 23:21:45 GMT

Dear friends,

Thanks for your suggestions.

the problem is fixed!, I changed the frequencies update logic from:
try{insert}catch{update}, to : count=update;if (count==0) insert, with the
multithread strategy (also reusing connections), and it works like a charm,
700MB of text on a 13MB database...
Already I don´t know why this happens (apparently the problem was updating
on exception handling), and the trim operation didn't affect the whole
operation.

Best regards,
Héctor


try{
>                        st.execute();
>                }
>                catch(SQLIntegrityConstraintViolationException e){
>                        PreparedSta



Sai Pullabhotla wrote:
> 
> Definitely the word.trim().toLowerCase() should appear before setting
> it on the statement. Otherwise, the statement would actually inserting
> the original word with white spaces and mixed cases. Which could be
> why there are so many words.
> 
> Also, when you do an update in the catch block, I don't think you have
> to do the inner select. You could simply use a statement like this:
> 
> update words set frequency=frequency+1 where word=?
> 
> Sai Pullabhotla
> Phone: (402) 408-5753
> Fax: (402) 408-6861
> www.jMethods.com
> 
> 
> 
> On Thu, Sep 11, 2008 at 7:39 AM, hcadavid <hectorcadavid@yahoo.com> wrote:
>>
>> Dear friends,
>>
>> I'm using derby db to record word's frequencies from a large text corpus
>> with a java program. It works nice with standard statements, like:
>> "INSERT
>> INTO WORDS VALUES('"+word+"',1)" (it takes 50Mb to store 400000 words),
>> but
>> when I switched to prepared statements and inner statements(in order to
>> improve performance) and repeated the process, after few hours of
>> processing
>> (200MB of plain text), the database's disk consumption gets an absurd
>> dimension: 13GB!, I mean, 13GB of disk space to store 400000 words (of
>> standard length) and its frequencies!!. What may be the problem??
>> the biggest file is: seg0\c3c0.dat (13GB), there are no log files
>> problem.
>>
>> Here is how I'm making insertions and updates:
>>
>>                Connection
>> con=EmbeddedDBMSConnectionBroker.getConnection();
>>                PreparedStatement st=con.prepareStatement("INSERT INTO
>> WORDS
>> VALUES(?,1)");
>>                st.setString(1, word);
>>
>>                word=word.trim().toLowerCase();
>>
>>                try{
>>                        st.execute();
>>                }
>>                catch(SQLIntegrityConstraintViolationException e){
>>                        PreparedStatement ps=con.prepareStatement("update
>> words set
>> frequency=((select frequency from words where word=?)+1) where word=?");
>>                        ps.setString(1, word);
>>                        ps.setString(2, word);
>>                        ps.execute();
>>                }
>>
>>                con.commit();
>>                con.close();
>>
>> This method is used concurrently by 100 threads. Please, anyone know the
>> causes of this estrange Derby's behavior?? (handling GBs of disk space
>> just
>> for store few words isn't reasonable!).
>>
>> Thanks in advance
>>
>> Héctor
>> --
>> View this message in context:
>> http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19433858.html
>> Sent from the Apache Derby Users mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19465767.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.


Mime
View raw message