Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5A636200D0C for ; Wed, 6 Sep 2017 08:10:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 58F5F1610D4; Wed, 6 Sep 2017 06:10:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9FB931609DD for ; Wed, 6 Sep 2017 08:10:04 +0200 (CEST) Received: (qmail 30452 invoked by uid 500); 6 Sep 2017 06:10:03 -0000 Mailing-List: contact dev-help@atlas.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@atlas.apache.org Delivered-To: mailing list dev@atlas.apache.org Received: (qmail 30435 invoked by uid 99); 6 Sep 2017 06:10:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Sep 2017 06:10:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 86B84181722 for ; Wed, 6 Sep 2017 06:10:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 63_ucYNr-gOv for ; Wed, 6 Sep 2017 06:10:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 11F805F254 for ; Wed, 6 Sep 2017 06:10:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5DCF3E00A9 for ; Wed, 6 Sep 2017 06:10:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0D00B24143 for ; Wed, 6 Sep 2017 06:10:00 +0000 (UTC) Date: Wed, 6 Sep 2017 06:10:00 +0000 (UTC) From: "Apoorv Naik (JIRA)" To: dev@atlas.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ATLAS-2117) Basic search issues due to Titan Solr schema MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 06 Sep 2017 06:10:05 -0000 [ https://issues.apache.org/jira/browse/ATLAS-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apoorv Naik updated ATLAS-2117: ------------------------------- Summary: Basic search issues due to Titan Solr schema (was: Titan Indexer tokenization issues) > Basic search issues due to Titan Solr schema > -------------------------------------------- > > Key: ATLAS-2117 > URL: https://issues.apache.org/jira/browse/ATLAS-2117 > Project: Atlas > Issue Type: Bug > Affects Versions: 0.8-incubating, 0.9-incubating, 0.8.1-incubating > Reporter: Apoorv Naik > Assignee: Apoorv Naik > Fix For: 0.8-incubating, 0.9-incubating, 0.8.1-incubating > > > When using Solr as indexing backend, the tokenization of the string is performed using the StandardTokenizerFactory which treats punctuations and special characters as delimiters which results in the more indexed terms being associated with the associated vertex (document) > Also there's a LowercaseFilterFactory which makes lookup case insensitive. > This schema design doesn't work well for the current basic search enhancement (ATLAS-1880) causing a lot of false positives/negatives when querying the index. > The workaround/hack for this is to do an in-memory filtering when such schema violations are found or push the entire attribute query down to the graph which might be in-efficient and memory intensive. (Current JIRA will track this) > Correct solution would be to re-index the existing data with a schema change and not use the mentioned code workarounds for better performance of the search. (Should be taken up in separate JIRA) -- This message was sent by Atlassian JIRA (v6.4.14#64029)