Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 79495 invoked from network); 18 Feb 2008 12:57:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Feb 2008 12:57:31 -0000 Received: (qmail 97711 invoked by uid 500); 18 Feb 2008 12:57:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 97648 invoked by uid 500); 18 Feb 2008 12:57:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 97637 invoked by uid 99); 18 Feb 2008 12:57:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2008 04:57:18 -0800 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=SPF_NEUTRAL,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.66] (HELO spunkymail-a19.g.dreamhost.com) (208.97.132.66) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2008 12:56:31 +0000 Received: from [192.168.0.3] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a19.g.dreamhost.com (Postfix) with ESMTP id A6C81110E1 for ; Mon, 18 Feb 2008 04:56:50 -0800 (PST) Message-Id: From: Grant Ingersoll To: java-user@lucene.apache.org In-Reply-To: <15544612.post@talk.nabble.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Subject: Re: Problem using Lucene on Ubuntu Date: Mon, 18 Feb 2008 07:56:49 -0500 References: <15543843.post@talk.nabble.com> <15544612.post@talk.nabble.com> X-Mailer: Apple Mail (2.919.2) X-Virus-Checked: Checked by ClamAV on apache.org How are you loading the document into the content variable below? My guess is still that you have different locales on Windows and Ubuntu. (Btw, sorry about the java-user comment. I should wake up before sending responses. For some reason I thought the email was sent to java-dev) -Grant On Feb 18, 2008, at 7:44 AM, kratoras wrote: > > Actually what i figured out just now is that the problem is on the > indexing > part. A document with a 15MB size is transformed in a 23MB index > which is > not normal since on windows for the same document the index is 3MB. > For the > indexing i use: > writer = new IndexWriter(index, new GreekAnalyzer(), !index.exists()); > and to add documents: > doc.add(new > Field("contents",content,Field.Store.YES,Field.Index.TOKENIZED)); > > where "content" is a string with the content of the document. Should i > convert this string to UTF-8 using getBytes before i write it to the > index?? > > -- > View this message in context: http://www.nabble.com/Problem-using-Lucene-on-Ubuntu-tp15543843p15544612.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > -------------------------- Grant Ingersoll http://lucene.grantingersoll.com http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org