Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D57D518113 for ; Mon, 15 Feb 2016 10:07:00 +0000 (UTC) Received: (qmail 49525 invoked by uid 500); 15 Feb 2016 10:06:56 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 49469 invoked by uid 500); 15 Feb 2016 10:06:56 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 49449 invoked by uid 99); 15 Feb 2016 10:06:56 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Feb 2016 10:06:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E32271A06F6 for ; Mon, 15 Feb 2016 10:06:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id XA44KHyUbSMO for ; Mon, 15 Feb 2016 10:06:52 +0000 (UTC) Received: from mail-qk0-f180.google.com (mail-qk0-f180.google.com [209.85.220.180]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id B1BEB31AFA for ; Mon, 15 Feb 2016 10:06:51 +0000 (UTC) Received: by mail-qk0-f180.google.com with SMTP id x1so53316061qkc.1 for ; Mon, 15 Feb 2016 02:06:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=HJe5uDPqAAVhshuyfo8erNX7zBP1CAZaPjlONmuNhdA=; b=Ds0AvXim/qNGUQW0/nlIZvww/QtDzQKkgjThY5ETLeI3Dfsp02GUYYR7OIgfgj327a l+j8XRcbvGMjCQLRjBfuO6gkpXGb2NOGEs+LWe+xkhjfLNSiPzL3XwHk+35DbqXyyXU3 rYEp2vArQ7RnCMGi8lZuz4KOY5+7VONp15SSsFoK2tUOO4We6eJiiR0ne9OQXh/lC2Pi wNd6WLPnQGqQVy/Iz7n5G5BZQv3ecwQA8f80YmXjqXfywVCsqBSqYY601m7BLGrrBc9W vxnxusUWkFlIoVlJusB9lfO8c8yhUhXnJNnWUZMJBJezf1Gl6BtzKYY7Re23SlCDAkAk jJtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=HJe5uDPqAAVhshuyfo8erNX7zBP1CAZaPjlONmuNhdA=; b=PXvxYllX/qAqKFLtBjfzKZ+w5gGdvPbQwsfQiFmhqZHyXnTDjTZNgVZLn3q4qXKTrH l+9E6y2gV6dnFULT3sj/XfZxormVlwHO5kvYk4ya0iv0+3Ky21jxE/XJrNpVGsnH+tVQ tFCNZ4cLJZXjyKTAOHlxwplrBz8+QtEMbXUc5EPxmCflOuhTJjS6EWqJDPwUNNPJD+qE 130Qi5IQH+IAj9g0yHlcf993Q3gm2h7F7r78q3bwm6JvvO5dDaSFfidlpXbBIEHECiNV bua9MDy6m0HmvuPQhaOKtPPQp6w0kIKBlop+cbaGtLMQTmrNsaWtrj5DdgJ64iYiqAIH RNew== X-Gm-Message-State: AG10YOQV83Ab5P0vQtE683/IxwGHYgaDUinCqLcnvqApQdjj4cJhP8hKWTeOsVtFmr3bQUx+gYzfCZk8YnqXBg== X-Received: by 10.55.27.149 with SMTP id m21mr9529733qkh.51.1455530805133; Mon, 15 Feb 2016 02:06:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.97.66 with HTTP; Mon, 15 Feb 2016 02:06:05 -0800 (PST) In-Reply-To: References: <56C058C9.9050807@hoplahup.net> From: "Evert R." Date: Mon, 15 Feb 2016 08:06:05 -0200 Message-ID: Subject: Re: Highlight brings the content from the first pages of pdf To: solr-user Content-Type: multipart/alternative; boundary=001a1140aefc9988f4052bcc2c62 --001a1140aefc9988f4052bcc2c62 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Binoy, Thank you very much for you reply and explanation. Best regards, *--Evert* 2016-02-14 23:28 GMT-02:00 Binoy Dalal : > What you've done so far will highlight every instance of "nietava" found = in > the field, and return it, i.e., your entire field will return with all th= e > "nietava"s in tags. > If you do not want the entire field, only portions of your field containi= ng > the matched terms, then use hl.snippets parameter =3D the number of snipp= ets > you want, in this particular case 3, along with the hl.fragsize parameter > set to the same number as your hl.mazAnalyzedChars (or a really large > number). > > I suggest you go through the wiki documentation for highlighting once ( > https://wiki.apache.org/solr/HighlightingParameters). It should answer al= l > of your questions regarding the use of the standard highlighter that you > might have. > > As an additional note, I also suggest that you look into the > PostingsHighlighter ( > https://cwiki.apache.org/confluence/display/solr/Postings+Highlighter), > since you seem to be running highlighting on pretty big fields and postin= gs > is much more efficient at highlighting huge fields as compared to the > standard highlighter. > > On Mon, Feb 15, 2016 at 4:15 AM Evert R. wrote: > > > Binoy, > > > > You are the man! =3D) > > > > Thank you very much! > > > > Would you by chance know how could I get the second highlight of the sa= me > > word in the same file? > > > > Like: file_1.pdf (has three words "nietava") so..., how can I bring the > > highlighs for the three occurrences? > > > > I am pretty new around, should I send (open) another subject? > > > > Thanks again! > > > > > > *--Evert* > > > > 2016-02-14 16:40 GMT-02:00 Binoy Dalal : > > > > > Are you sure you've typed in the parameters correctly? > > > In your response it says flagsize instead of fragsize and > > maxanalzyedchars > > > instead of maxanalyzedchars. > > > > > > Ohh wait, I see that I made the analyzed typo. Awfully sorry for that= , > > I'm > > > using my phone to send the mail out. > > > > > > On Sun, 14 Feb 2016, 23:53 Evert R. wrote: > > > > > > > Hi Binoy, > > > > > > > > thanks! > > > > > > > > Still not working, check the output: > > > > > > > > { > > > > "responseHeader":{ > > > > "status":0, > > > > "QTime":58, > > > > "params":{ > > > > "q":"nietava", > > > > "hl":"true", > > > > "hl.simple.post":"", > > > > "indent":"true", > > > > "fl":"id", > > > > "hl.flagsize":"0", > > > > "hl.fl":"content", > > > > "hl.maxAnalzyedChars":"208400", > > > > "wt":"json", > > > > "hl.simple.pre":""}}, > > > > "response":{"numFound":1,"start":0,"docs":[ > > > > { > > > > "id":"/home/solr/dados/teste/Emmanuel.pdf"}] > > > > }, > > > > "highlighting":{ > > > > "/home/solr/dados/teste/Emmanuel.pdf":{}}} > > > > > > > > > > > > > > > > *--Evert* > > > > > > > > 2016-02-14 14:31 GMT-02:00 Binoy Dalal : > > > > > > > > > Don't add this parameter to the searchComponent definition, becau= se > > the > > > > > components where you've added it, GapFragmenter and > RegexFragmenter, > > > > simply > > > > > don't use it. > > > > > Instead, add it to your request handler (/select etc.) if you've > > > > configured > > > > > highlighting in the handler or append it to your query: > > > > > *&hl.maxAnalzyedChars=3D*. > > > > > Additionally also set the *hl.fragsize parameter to 0*, if your > text > > is > > > > > larger than 51200 chars which it mostly is, in a similar fashion. > > > > > > > > > > > > > > > On Sun, Feb 14, 2016 at 9:02 PM Evert R. > > > wrote: > > > > > > > > > > > Hi Binoy, > > > > > > > > > > > > I could not find this option in my solrconfig.xml file. ] > > > > > > > > > > > > I tryied to add this setting and nothing changed... > > > > > > > > > > > > Here is the code, I might miss placed: > > > > > > > > > > > > > > > > > > name=3D"highlight"> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > default=3D"true" > > > > > > class=3D"solr.highlight.GapFragmenter"> > > > > > > > > > > > > 400 > > > > > > 409600 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > class=3D"solr.highlight.RegexFragmenter"> > > > > > > > > > > > > > > > > > > 200 > > > > > > 409600 > > > > > > > > > > > > 0.5 > > > > > > > > > > > > [-\w > > > > > > ,/\n\"']{20,200} > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks! > > > > > > > > > > > > > > > > > > *--Evert* > > > > > > > > > > > > 2016-02-14 12:14 GMT-02:00 Binoy Dalal = : > > > > > > > > > > > > > From the solr wiki: > > > > > > > hl.maxAnalyzedChars > > > > > > > > > > > > > > How many characters into a document to look for suitable > > > > > > > snippets =EF=BF=BC Solr1.3. This parameter makes sense for th= e original > > > > > > Highlighter > > > > > > > only. > > > > > > > > > > > > > > The default value is "51200". > > > > > > > > > > > > > > You can assign a large value to this parameter and use > > > hl.fragsize=3D0 > > > > to > > > > > > > return highlighting in large fields that have size greater th= an > > > 51200 > > > > > > > characters. > > > > > > > > > > > > > > I think this might be your hiccup. > > > > > > > > > > > > > > On Sun, 14 Feb 2016, 17:11 Evert R. > > wrote: > > > > > > > > > > > > > > > Hi Paul, > > > > > > > > > > > > > > > > Sorry my late reply. > > > > > > > > > > > > > > > > All the content is inside de docs. It brings the docs and t= he > > pdf > > > > > file > > > > > > > that > > > > > > > > has the search word in it. But the highlight is not showing > if > > > the > > > > > > search > > > > > > > > word is after a few pages. > > > > > > > > > > > > > > > > Evert > > > > > > > > > > > > > > > > > > > > > > > > *--Evert* > > > > > > > > > > > > > > > > 2016-02-14 8:36 GMT-02:00 Paul Libbrecht >: > > > > > > > > > > > > > > > > > This looks like the stored content is shortened. Can it b= e? > > > > > > > > > Can you see that inside the docs? > > > > > > > > > > > > > > > > > > paul > > > > > > > > > > > > > > > > > > > Evert R. > > > > > > > > > > 14 February 2016 at 11:26 > > > > > > > > > > Hi There, > > > > > > > > > > > > > > > > > > > > I have a situation where started a techproducts, withou= t > > any > > > > > > > > > modification, > > > > > > > > > > post a pdf file. When searching as: > > > > > > > > > > > > > > > > > > > > q=3Dtext:search_word > > > > > > > > > > hl=3Dtrue > > > > > > > > > > hl.fl=3Dcontent > > > > > > > > > > > > > > > > > > > > It show the highlight accordingly! =3D) > > > > > > > > > > > > > > > > > > > > BUT... *if the "search_word" is after the first pages* = in > > my > > > > pdf > > > > > > > file, > > > > > > > > > > such > > > > > > > > > > as page 15... > > > > > > > > > > > > > > > > > > > > It simply *does not show* *the HIGHLIGHT*... > > > > > > > > > > > > > > > > > > > > Does anyone has faced this situation before? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *--Evert* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Regards, > > > > > > > Binoy Dalal > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > Binoy Dalal > > > > > > > > > > > > -- > > > Regards, > > > Binoy Dalal > > > > > > -- > Regards, > Binoy Dalal > --001a1140aefc9988f4052bcc2c62--