Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 36023 invoked from network); 1 Apr 2010 17:40:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Apr 2010 17:40:05 -0000 Received: (qmail 25221 invoked by uid 500); 1 Apr 2010 17:39:59 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 25160 invoked by uid 500); 1 Apr 2010 17:39:58 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 25122 invoked by uid 99); 1 Apr 2010 17:39:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Apr 2010 17:39:58 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tetranz@gmail.com designates 209.85.220.211 as permitted sender) Received: from [209.85.220.211] (HELO mail-fx0-f211.google.com) (209.85.220.211) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Apr 2010 17:39:52 +0000 Received: by fxm3 with SMTP id 3so1048755fxm.11 for ; Thu, 01 Apr 2010 10:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=FgZugFTJMMDh/7COiDOJ5n10jTHswPds6tIs6bCbTcE=; b=OJujUcRGHKbpgDXIri08CAfbDPdOnK3FleBoACey+JIbFqRCbk0b5CtgRqZHvuuA3z jONd3LuD1RFUHTwJ1+08a/5QFBZ9ykaZGXxNH9L0simZQHgBKtdGRgUroy/TFggykLJL jDZmaXE7gP00GRGclPt5Hw0ZEffyGeKE2B5vA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=M3tkhVzWGNQInSQN0Pi4rGf8Lc/Y3tCsTxqmcFqu0tMNxvhBKSD9ASPRlgOKKNWxau fsnj83o60iSZQylJFbufttjV8yHtllB1lf2WHQprhf4i3ib6jgPmGpfwwMhrTipmTH4p RwZw7+r4/qCHzwvE10J2pYjzoKwwtsqS53hZY= MIME-Version: 1.0 Received: by 10.239.156.72 with HTTP; Thu, 1 Apr 2010 10:39:31 -0700 (PDT) In-Reply-To: References: <33e3734f1003210958m7885f410sa4ca36ac9a6cea74@mail.gmail.com> <9F1E1969-872E-402D-A80F-44C10F35BD81@gmail.com> Date: Thu, 1 Apr 2010 13:39:31 -0400 Received: by 10.239.183.71 with SMTP id t7mr86958hbg.36.1270143571890; Thu, 01 Apr 2010 10:39:31 -0700 (PDT) Message-ID: Subject: Re: Solr crashing while extracting from very simple text file From: Ross To: solr-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Chris, thanks for looking at this. I'm using Solr 1.4.0 including the Tika that's in the tgz file which means Tika 0.4. I've now discovered that only two letters are required. A single line with XE will crash it. This fails: root@gamma:/home/ross# hexdump -C test.txt 00000000 58 45 0a |XE.| 00000003 root@gamma:/home/ross# This works root@gamma:/home/ross# hexdump -C test.txt 00000000 58 46 0a |XF.| 00000003 root@gamma:/home/ross# XA, XB, XC, XD, XF all work okay. There's just something special about XE. The command I use is: curl "http://localhost:8080/solr-example/update/extract?literal.id=3Ddoc1&f= map.content=3Dbody&commit=3Dtrue" -F "myfile=3D@test.txt" I filed a bug at https://issues.apache.org/jira/browse/TIKA-397 but I guess 0.4 is an old version so I wouldn't expert it to get much attention. It looks like I should upgrade Tika to 0.6. I don't really know how to do that or if Solr 1.4 works with Tika 0.6. The Tika pages talk about using Maven to build it. Sorry, I'm no Linux expert. Ross On Thu, Apr 1, 2010 at 1:07 PM, Chris Hostetter wrote: > > : Yes, please report this to the Tika project. > > except that when i run "tika-app-0.6.jar" on a text file like the one Ros= s > describes, i don't get the error he describes, which means it may be > something off in how Solr is using Tika. > > Ross: I can't reproduce this error on the trunk using the example solr > configs and the text file below. =A0can you verify exactly which version = of > SOlr you are using (and which version of tika you are using inside solr) > and the exact byte contents of your simplest problematic text file? > > hossman@brunner:~/tmp$ cat tmp.txt > x > x > XXBLE > hossman@brunner:~/tmp$ hexdump -C tmp.txt > 00000000 =A078 0a 78 0a 58 58 42 4c =A045 0a =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0|x.x.XXBLE.| > 0000000a > hossman@brunner:~/tmp$ curl "http://localhost:8983/solr/update/extract?li= teral.id=3D1&commit=3Dtrue" -F "myfile=3D@tmp.txt" > > > 0 name=3D"QTime">66 > > > > -Hoss > >