db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sai Pullabhotla" <sai.pullabho...@jmethods.com>
Subject Re: Derby problem: 13GB of space with 200000 records!
Date Thu, 11 Sep 2008 14:04:08 GMT
Definitely the word.trim().toLowerCase() should appear before setting
it on the statement. Otherwise, the statement would actually inserting
the original word with white spaces and mixed cases. Which could be
why there are so many words.

Also, when you do an update in the catch block, I don't think you have
to do the inner select. You could simply use a statement like this:

update words set frequency=frequency+1 where word=?

Sai Pullabhotla
Phone: (402) 408-5753
Fax: (402) 408-6861

On Thu, Sep 11, 2008 at 7:39 AM, hcadavid <hectorcadavid@yahoo.com> wrote:
> Dear friends,
> I'm using derby db to record word's frequencies from a large text corpus
> with a java program. It works nice with standard statements, like: "INSERT
> INTO WORDS VALUES('"+word+"',1)" (it takes 50Mb to store 400000 words), but
> when I switched to prepared statements and inner statements(in order to
> improve performance) and repeated the process, after few hours of processing
> (200MB of plain text), the database's disk consumption gets an absurd
> dimension: 13GB!, I mean, 13GB of disk space to store 400000 words (of
> standard length) and its frequencies!!. What may be the problem??
> the biggest file is: seg0\c3c0.dat (13GB), there are no log files problem.
> Here is how I'm making insertions and updates:
>                Connection con=EmbeddedDBMSConnectionBroker.getConnection();
>                PreparedStatement st=con.prepareStatement("INSERT INTO WORDS
> VALUES(?,1)");
>                st.setString(1, word);
>                word=word.trim().toLowerCase();
>                try{
>                        st.execute();
>                }
>                catch(SQLIntegrityConstraintViolationException e){
>                        PreparedStatement ps=con.prepareStatement("update words set
> frequency=((select frequency from words where word=?)+1) where word=?");
>                        ps.setString(1, word);
>                        ps.setString(2, word);
>                        ps.execute();
>                }
>                con.commit();
>                con.close();
> This method is used concurrently by 100 threads. Please, anyone know the
> causes of this estrange Derby's behavior?? (handling GBs of disk space just
> for store few words isn't reasonable!).
> Thanks in advance
> H├ęctor
> --
> View this message in context: http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19433858.html
> Sent from the Apache Derby Users mailing list archive at Nabble.com.

View raw message