lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross <tetr...@gmail.com>
Subject Re: Solr crashing while extracting from very simple text file
Date Thu, 01 Apr 2010 17:39:31 GMT
Hi Chris, thanks for looking at this.

I'm using Solr 1.4.0 including the Tika that's in the tgz file which
means Tika 0.4.

I've now discovered that only two letters are required. A single line
with XE will crash it.

This fails:

root@gamma:/home/ross# hexdump -C test.txt
00000000  58 45 0a                                          |XE.|
00000003
root@gamma:/home/ross#

This works

root@gamma:/home/ross# hexdump -C test.txt
00000000  58 46 0a                                          |XF.|
00000003
root@gamma:/home/ross#

XA, XB, XC, XD, XF all work okay. There's just something special about XE.

The command I use is:

curl "http://localhost:8080/solr-example/update/extract?literal.id=doc1&fmap.content=body&commit=true"
-F "myfile=@test.txt"

I filed a bug at https://issues.apache.org/jira/browse/TIKA-397 but I
guess 0.4 is an old version so I wouldn't expert it to get much
attention.

It looks like I should upgrade Tika to 0.6. I don't really know how to
do that or if Solr 1.4 works with Tika 0.6. The Tika pages talk about
using Maven to build it. Sorry, I'm no Linux expert.

Ross


On Thu, Apr 1, 2010 at 1:07 PM, Chris Hostetter
<hossman_lucene@fucit.org> wrote:
>
> : Yes, please report this to the Tika project.
>
> except that when i run "tika-app-0.6.jar" on a text file like the one Ross
> describes, i don't get the error he describes, which means it may be
> something off in how Solr is using Tika.
>
> Ross: I can't reproduce this error on the trunk using the example solr
> configs and the text file below.  can you verify exactly which version of
> SOlr you are using (and which version of tika you are using inside solr)
> and the exact byte contents of your simplest problematic text file?
>
> hossman@brunner:~/tmp$ cat tmp.txt
> x
> x
> XXBLE
> hossman@brunner:~/tmp$ hexdump -C tmp.txt
> 00000000  78 0a 78 0a 58 58 42 4c  45 0a                    |x.x.XXBLE.|
> 0000000a
> hossman@brunner:~/tmp$ curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true"
-F "myfile=@tmp.txt"
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">66</int></lst>
> </response>
>
>
> -Hoss
>
>

Mime
View raw message