Return-Path: X-Original-To: apmail-jackrabbit-dev-archive@www.apache.org Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EF1C6C07F for ; Thu, 19 Jul 2012 20:49:02 +0000 (UTC) Received: (qmail 46934 invoked by uid 500); 19 Jul 2012 20:49:02 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 46896 invoked by uid 500); 19 Jul 2012 20:49:02 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 46888 invoked by uid 99); 19 Jul 2012 20:49:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2012 20:49:02 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of victor.giordano@globant.com designates 74.125.149.147 as permitted sender) Received: from [74.125.149.147] (HELO na3sys009aog122.obsmtp.com) (74.125.149.147) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2012 20:48:53 +0000 Received: from mail-qc0-f174.google.com ([209.85.216.174]) (using TLSv1) by na3sys009aob122.postini.com ([74.125.148.12]) with SMTP ID DSNKUAhyoM7TorsY23WC173nflRO8MM2zs7o@postini.com; Thu, 19 Jul 2012 13:48:33 PDT Received: by qcro28 with SMTP id o28so2083022qcr.5 for ; Thu, 19 Jul 2012 13:48:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=a0RNTep2PkcwiYNtY6v+jtZ9jxBNvsC6BgwNdawb4Us=; b=B7Ycd1RuL74XWZxXL8V+idD/w03eSNcodEZrQlPIDwZ88m1o07L0m0c1Y76HO4HsSE Eu1tEOKWcLbVDmWvIcfkKefVEkYzMAxV7eqgjyo01h+mK5GYYK0Cqpz5Vzo1umnsDpof nyiuUuELCN3Hoo9yRh39BPF3v9ylruClO4Ad0ywkgB8ZyL7L0rO09Etb66TaJ4+aWTNZ R7HUXQolcSll3xtkVrczvk/VKJFzFVuGKSZJ2bQOqgQdrySesqvs7BeF44b0ArzEa4bR 5V0WCkgcdweyoGhj+7YgHOIqjtKc9bupCVqBvYSroT9kAENedRFDA7D7tUvSAyJwK02F G3kw== Received: by 10.229.135.84 with SMTP id m20mr1592723qct.89.1342730904258; Thu, 19 Jul 2012 13:48:24 -0700 (PDT) Received: from [10.210.165.247] ([190.216.51.2]) by mx.google.com with ESMTPS id z9sm3779536qae.15.2012.07.19.13.48.23 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 19 Jul 2012 13:48:23 -0700 (PDT) Message-ID: <50087294.9020106@globant.com> Date: Thu, 19 Jul 2012 17:48:20 -0300 From: Victor Giordano User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: dev@jackrabbit.apache.org Subject: Re: Search by an inputstream property References: <500462B0.9010904@globant.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQkzq8wBDwkJWibCydxZmsmYPw6Pgto67D60nYlouQiDB8Wtq2FNphigOLgFvm2vaHzjoe1S X-Virus-Checked: Checked by ClamAV on apache.org Alex, thanks for your quick reply. I will have all of this in mind!!!. We fixed these problem by adding a new property to the node, that contains the text context, whose type is String. So we make the query against that property. We use a custom text extractor (tika text extractor), and we add the necessary code to populate the new property when a file is uploaded. Thanks again Greeting Victor On 7/19/2012 4:09 PM, Alexander Klimetschek wrote: > On 16.07.2012, at 20:51, Victor Giordano wrote: > >> Hi friends, i have a question about making a xpath expression for filtering resources by a property of type inputStream called data. >> How i can do a text search... for example... this is working: >> >> String xpath1 = "//element(*, nt:resource) [jcr:contains(@jcr:mimeType,'*plain*')]"; >> String xpath2 = "//element(*, nt:resource) [jcr:contains(@jcr:encoding,'*utf*')]"; > FYI: jcr:contains() runs full text searches (with terms split up, word stemming etc.), so you don't need wildcards. Just use > > jcr:contains(@jcr:mimeType, 'plain') > > If you want real pattern-like matching (and highly-structured mime type or encoding values are probably better served by that), use jcr:like, which uses % as wildcard: > > jcr:like(@jcr:mimeType, '%plain') > > This should only match a value "text/plain" or "plain", but not "plain with a suffix". > >> But this is not working.... >> String xpath3 = "//element(*, nt:resource) [jcr:contains(@jcr:data,'*plain*')]"; > The full text index for binary content is by default aggregated on the node itself, which you address with ".": > > //element(*, nt:resource) [jcr:contains(.,'plain')] > > The index configuration is documented here: http://wiki.apache.org/jackrabbit/IndexingConfiguration > > Cheers, > Alex >