manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shinichiro Abe <shinichiro.ab...@gmail.com>
Subject Re: [VOTE] Release Apache ManifoldCF 1.7 RC0
Date Tue, 12 Aug 2014 13:43:34 GMT
Hi Karl,

The content field was garbled via /update and tika connector.
Sample Docs: http://www.rondhuit.com/download.html#whitepaper
My mcf-job was from filesystem:Japanese PDF,XLS to Solr.

I was surprised that Solr threw an exception when
en_US end-user-documentation.pdf
was posted via tika connector. Posting files via /update/extract were not
garbled, not threw exceptions.
Could you reproduce this?

2268394 [qtp1224864813-14] ERROR org.apache.solr.servlet.SolrDispatchFilter
 – null:java.lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 character 0xffff at char
#112515, byte #184319)
at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:395)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
...
Caused by: java.io.CharConversionException: Invalid UTF-8 character 0xffff
at char #112515, byte #184319)
at com.ctc.wstx.io.UTF8Reader.reportInvalid(UTF8Reader.java:335)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:249)
at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
... 36 more

Thanks,
Shinichiro Abe




2014-08-12 22:24 GMT+09:00 Karl Wright <daddywri@gmail.com>:

> I ran "ant rat-sources", and inspected the packages.  All looks good.  The
> only comment is that the connector-lib area has grown by about 18MB this
> cycle, and of course all the images for the Chinese documentation add
> another 5MB, so our binary packages are now just about 200MB.  I don't
> think this something we can do a lot about, though, except maybe by
> repackaging so we release connectors independently of the framework.
>
> I'll give a final vote after I hear more back from Erlend and Abe-san.
>
> Thanks,
> Karl
>
>
> On Tue, Aug 12, 2014 at 2:23 AM, Karl Wright <daddywri@gmail.com> wrote:
>
> > I request that the vote be left open at least until 8/21/2014, since 1.7
> > is a major release and we want as many people to try it out as possible
> > before declaring it complete.  Thanks!
> >
> > Karl
> >
> >
> >
> > On Tue, Aug 12, 2014 at 12:44 AM, Shinichiro Abe <
> > shinichiro.abe.1@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> +1 from me.
> >>
> >> -Checked SIGS, checksum by running check_signatures.sh.
> >> -Checked that the code signing Key of Mingchun is available online.
> >>
> >> Shinichiro Abe
> >>
> >> On 2014/08/12, at 12:13, Mingchun Zhao <mingchun.zhao.2@gmail.com>
> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Please vote on whether to release the ManifoldCF, version 1.7, RC0.
> >> >
> >> > You can find the artifact at:
> >> >
> >> > http://people.apache.org/~mingchun/apache-manifoldcf-1.7-RC0
> >> >
> >> > There is also a tag at:
> >> >
> >> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.7-RC0
> >> >
> >> > Vote will remain open at least 72 hours.
> >> >
> >> > Thanks!
> >> > Mingchun Zhao
> >>
> >>
> >
>



-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Shinichiro Abe
阿部 慎一朗

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message