Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 54523 invoked from network); 3 Feb 2011 05:57:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Feb 2011 05:57:58 -0000 Received: (qmail 71758 invoked by uid 500); 3 Feb 2011 05:57:56 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 70523 invoked by uid 500); 3 Feb 2011 05:57:54 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 70514 invoked by uid 99); 3 Feb 2011 05:57:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Feb 2011 05:57:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Feb 2011 05:57:50 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 15CBF18B7E5 for ; Thu, 3 Feb 2011 05:57:29 +0000 (UTC) Date: Thu, 3 Feb 2011 05:57:29 +0000 (UTC) From: "Hsiu Wang (JIRA)" To: dev@lucene.apache.org Message-ID: <662479409.6741.1296712649086.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Issue Comment Edited: (LUCENE-2208) Token div exceeds length of provided text sized 4114 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989965#comment-12989965 ] Hsiu Wang edited comment on LUCENE-2208 at 2/3/11 5:56 AM: ----------------------------------------------------------- added patch(LUCENE-2208.patch) to fix org.apache.lucene.search.highlight.InvalidTokenOffsetsException. The exception is caused by HTML escape characters (e.g., &#38;, &amp; ) which are counted as 1 character in text.length() in Highlighter.getBestTextFragments, but in HTMLStripCharfilter, they are counted as N characters(& counted as 5). In the patch, I commented out an incorrect test case in HTMLStripCharFilterTest.testOffset()("X & X ( X < > X"). The commented out test case is covered by Robert's test patch. was (Author: hwang): added patch(LUCENE-2208.patch) to fix org.apache.lucene.search.highlight.InvalidTokenOffsetsException. The exception is caused by HTML escape characters (e.g., &#38;, &amp;) which are counted as 1 character in text.length() in Highlighter.getBestTextFragments, but in HTMLStripCharfilter, they are counted as N characters(& counted as 5). In the patch, I commented out an incorrect test case in HTMLStripCharFilterTest.testOffset()("X & X ( X < > X"). The commented out test case is covered by Robert's test patch. > Token div exceeds length of provided text sized 4114 > ---------------------------------------------------- > > Key: LUCENE-2208 > URL: https://issues.apache.org/jira/browse/LUCENE-2208 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/highlighter > Affects Versions: 3.0 > Environment: diagnostics = {os.version=5.1, os=Windows XP, lucene.version=3.0.0 883080 - 2009-11-22 15:43:58, source=flush, os.arch=x86, java.version=1.6.0_12, java.vendor=Sun Microsystems Inc.} > > Reporter: Ramazan VARLIKLI > Attachments: LUCENE-2208.patch, LUCENE-2208_test.patch > > > I have a doc which contains html codes. I want to strip html tags and make the test clear after then apply highlighter on the clear text . But highlighter throws an exceptions if I strip out the html characters , if i don't strip out , it works fine. It just confuses me at the moment > I copy paste 3 thing here from the console as it may contain special characters which might cause the problem. > 1 -) Here is the html text >

Starter

>
>
>
>
Learning path: History
>

Key question

>

Did transport fuel the industrial revolution?

>

Learning Objective

>
    >
  • To categorise points as for or against an argument
  • >
>

>

What to do?

>
    >
  • Watch the clip: Transport fuelled the industrial revolution.
  • >
>

The clips claims that transport fuelled the industrial revolution. Some historians argue that the industrial revolution only happened because of developments in transport.

>
    >
  • Read the statements below and decide which points are for and which points are against the argument that industry expanded in the 18th and 19th centuries because of developments in transport.
  • >
> >
    >
  1. Industry expanded because of inventions and the discovery of steam power.
  2. >
  3. Improvements in transport allowed goods to be sold all over the country and all over the world so there were more customers to develop industry for.
  4. >
  5. Developments in transport allowed resources, such as coal from mines and cotton from America to come together to manufacture products.
  6. >
  7. Transport only developed because industry needed it. It was slow to develop as money was spent on improving roads, then building canals and the replacing them with railways in order to keep up with industry.
  8. >
> >

Now try to think of 2 more statements of your own.

> >
>
>
>

Main activity

>
>
>
Learning path: History
>

Learning Objective

>
    >
  • To select evidence to support points
  • >
>

What to do?

> >
  • Choose the 4 points that you think are most important - try to be balanced by having two for and two against.
  • >
  • Write one in each of the point boxes of the paragraphs on the sheet Constructing a balanced argument.

You might like to re write the points in your own words and use connectives to link the paragraphs.

> >

In history and in any argument, you need evidence to support your points.

>
  • Find evidence from these sources and from your own knowledge to support each of your points:
>
    >
  1. At a toll gate
  2. >
  3. Canals
  4. >
  5. Growing cities: traffic
  6. >
  7. Impact of the railway
  8. >
  9. Sailing ships
  10. >
  11. Liverpool: Capital of Culture
  12. >
>

Try to be specific in your evidence - use named examples of places or people. Use dates if you can.

>
>
>
>

Plenary

>
>
>
Learning path: History
>

Learning Objective

>
    >
  • To judge which of the arguments is most valid
  • >
>

What to do?

> >

In order to be a good historian, and get good marks in exams, you need to show your evaluation skills and make a judgement. Having been through the evidence which point do you think is most important? Why? Is there more evidence? Is the evidence more convincing?

>
  • In the final box on your worksheet write a conclusion explaining whether on balance the evidence is enough to convince you that transport fuelled the industrial revolution.
>
>
>
>

Extension

>
>
>
Learning path: History
>

What to do?

>

Watch the clip Stress in a ski resort

>

New industries, such as tourism, can now be said to be fuelled by transport improvements.

>
  • Search Clipbank, using the Related clip lists as well as the search function, to find examples from around the world of how transport has helped industry.
>
>
>
> > > 2-) here is the text after stripped html tags out > Starter > > > > Learning path: History > Key question > Did transport fuel the industrial revolution? > Learning Objective > > To categorise points as for or against an argument > > > What to do? > > Watch the clip: Transport fuelled the industrial revolution. > > The clips claims that transport fuelled the industrial revolution. Some historians argue that the industrial revolution only happened because of developments in transport. > > Read the statements below and decide which points are for and which points are against the argument that industry expanded in the 18th and 19th centuries because of developments in transport. > > > > Industry expanded because of inventions and the discovery of steam power. > Improvements in transport allowed goods to be sold all over the country and all over the world so there were more customers to develop industry for. > Developments in transport allowed resources, such as coal from mines and cotton from America to come together to manufacture products. > Transport only developed because industry needed it. It was slow to develop as money was spent on improving roads, then building canals and the replacing them with railways in order to keep up with industry. > > > Now try to think of 2 more statements of your own. > > > > > Main activity > > > Learning path: History > Learning Objective > > To select evidence to support points > > What to do? > > Choose the 4 points that you think are most important - try to be balanced by having two for and two against . > Write one in each of the point boxes of the paragraphs on the sheet Constructing a balanced argument . You might like to re write the points in your own words and use connectives to link the paragraphs. > > In history and in any argument, you need evidence to support your points. > Find evidence from these sources and from your own knowledge to support each of your points: > > At a toll gate > Canals > Growing cities: traffic > Impact of the railway > Sailing ships > Liverpool: Capital of Culture > > Try to be specific in your evidence - use named examples of places or people. Use dates if you can. > > > > Plenary > > > Learning path: History > Learning Objective > > To judge which of the arguments is most valid > > What to do? > > In order to be a good historian, and get good marks in exams, you need to show your evaluation skills and make a judgement. Having been through the evidence which point do you think is most important? Why? Is there more evidence? Is the evidence more convincing? > In the final box on your worksheet write a conclusion explaining whether on balance the evidence is enough to convince you that transport fuelled the industrial revolution. > > > > Extension > > > Learning path: History > What to do? > Watch the clip Stress in a ski resort > New industries, such as tourism, can now be said to be fuelled by transport improvements. > Search Clipbank, using the Related clip lists as well as the search function, to find examples from around the world of how transport has helped industry. > > > > > 3-) here is the exception I get > org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token div exceeds length of provided text sized 4114 > at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:228) > at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:158) > at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:462) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org