db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suavi Ali Demir <dem...@yahoo.com>
Subject Re: Derby problem: 13GB of space with 200000 records!
Date Thu, 11 Sep 2008 13:43:52 GMT
Does the same happen if you use single thread? Does disk space use go down if you compress
table?

Should the 

word=word.trim().toLowerCase(); 

appear before setString?

When you store the frequency of words, you would have records like:
"home", 1217

Then the number of rows would not exceed couple thousand. Does this text really have 200k
unique words? Do you have a unique index on word column? 

A suggestion: Single thread that re-uses the connection and the prepared statements might
go faster than multiple threads.
Regards,
Ali

--- On Thu, 9/11/08, hcadavid <hectorcadavid@yahoo.com> wrote:

> From: hcadavid <hectorcadavid@yahoo.com>
> Subject: Derby problem: 13GB of space with 200000 records!
> To: derby-user@db.apache.org
> Date: Thursday, September 11, 2008, 5:39 AM
> Dear friends,
> 
> I'm using derby db to record word's frequencies
> from a large text corpus
> with a java program. It works nice with standard
> statements, like: "INSERT
> INTO WORDS VALUES('"+word+"',1)" (it
> takes 50Mb to store 400000 words), but
> when I switched to prepared statements and inner
> statements(in order to
> improve performance) and repeated the process, after few
> hours of processing
> (200MB of plain text), the database's disk consumption
> gets an absurd
> dimension: 13GB!, I mean, 13GB of disk space to store
> 400000 words (of
> standard length) and its frequencies!!. What may be the
> problem??
> the biggest file is: seg0\c3c0.dat (13GB), there are no
> log files problem.
> 
> Here is how I'm making insertions and updates:
> 
> 	        Connection
> con=EmbeddedDBMSConnectionBroker.getConnection();
> 		PreparedStatement st=con.prepareStatement("INSERT
> INTO WORDS
> VALUES(?,1)");
> 		st.setString(1, word);
> 		
> 		word=word.trim().toLowerCase();
> 		
> 		try{
> 			st.execute();	
> 		}
> 		catch(SQLIntegrityConstraintViolationException e){
> 			PreparedStatement ps=con.prepareStatement("update
> words set
> frequency=((select frequency from words where word=?)+1)
> where word=?");
> 			ps.setString(1, word);
> 			ps.setString(2, word);
> 			ps.execute();
> 		}
> 		
> 		con.commit();
> 		con.close();
> 
> This method is used concurrently by 100 threads. Please,
> anyone know the
> causes of this estrange Derby's behavior?? (handling
> GBs of disk space just
> for store few words isn't reasonable!).
> 
> Thanks in advance
> 
> H├ęctor
> -- 
> View this message in context:
> http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19433858.html
> Sent from the Apache Derby Users mailing list archive at
> Nabble.com.


      

Mime
View raw message