Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 55A78200D13 for ; Sat, 30 Sep 2017 20:59:18 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 510661609D5; Sat, 30 Sep 2017 18:59:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 974EB1609C2 for ; Sat, 30 Sep 2017 20:59:17 +0200 (CEST) Received: (qmail 47465 invoked by uid 500); 30 Sep 2017 18:59:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 47453 invoked by uid 99); 30 Sep 2017 18:59:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Sep 2017 18:59:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D0D781A4658 for ; Sat, 30 Sep 2017 18:59:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.38 X-Spam-Level: X-Spam-Status: No, score=0.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id MnzSAXnsyBSv for ; Sat, 30 Sep 2017 18:59:09 +0000 (UTC) Received: from mail-yw0-f180.google.com (mail-yw0-f180.google.com [209.85.161.180]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D7F475F567 for ; Sat, 30 Sep 2017 18:59:08 +0000 (UTC) Received: by mail-yw0-f180.google.com with SMTP id q6so1312509ywj.9 for ; Sat, 30 Sep 2017 11:59:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=0IbXroTslghLsEjEAa5yi3EKn8fE3tMoyNXeiwYPae4=; b=PSzy+rEY1/O5y7U33KR/WV8XrlMxY8eLn8M/YuKoWmOtQjnvGSzJqVJ34J19yOQfS3 wPovNsD7TSreZWpfzQRLLnogLEXMFR6Oxyn5SlA0k42iFGi4FqonJ0OgEPHbaOUOTkBE BYtBEGR8F4ILhB0DXbWZaapiJHm4ZuSi5SEPxRR5EGxCRZq3168f/1Awl9A4M2hfmteC 3H5Bf6sr6bHo5DhN/0GncY5CAmepPpb+LcivuY0McmlGAbuYoFOLdsVWZ54teRVRi+Si NVGNB7n3kg8YZ4Q3s34kRfhuVul7D+uBjZydVj4Au7Y0Cef9C5M/d3Uqw8HUD+F6jv21 JQlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=0IbXroTslghLsEjEAa5yi3EKn8fE3tMoyNXeiwYPae4=; b=rPPrjJEEJ4BtUUMpKw+TlYAsZiMeN1txPzxsFYBgcJdh26Uh47Vui8165w74dhfD0n oZBrmFK+yvU2EUuULduaKXRpY356Msnfat1X525DLAB2rusna/zghUtSaiT5Lvj2fomf frj3jaeWQgAkovjlm4C21JBQJhNsw6HT87D46AT7JJbUgJFn+HGyPxq1QXsljWTWVULz N1PlfG0MXgRys/qJ1BqiIwMS59pGwb236jl0ytRYT50Ytv6Z612JD/1xEtE0l8mbHDQ6 tr1HHXI2tXJbkVbuH5+dfXqct9DegqQvBn7YcKHKcooLDw/NFwaLss/4JJ9u1O+B0WP3 oAWQ== X-Gm-Message-State: AMCzsaVkgA583qh+d2RUlqhZekIfyt5rLrMk4n9f6HxVS4Cp1xySMaMH aZLHImg3FvrArvnCIg0ZxiRlirdoUr0+uLx1BQdIcw== X-Google-Smtp-Source: AOwi7QBgxEurW3wvYOBXX/btBpdQAF+mJeVOKfBI83HwduGkeczOoKDV8iCdWrTpOLGnxNof9x538A0JFrnjD9nDSsY= X-Received: by 10.129.141.7 with SMTP id d7mr1223662ywg.202.1506797947362; Sat, 30 Sep 2017 11:59:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.219.132 with HTTP; Sat, 30 Sep 2017 11:58:36 -0700 (PDT) In-Reply-To: References: From: Dawid Weiss Date: Sat, 30 Sep 2017 20:58:36 +0200 Message-ID: Subject: Re: Binary Automaton To: Lucene Users Content-Type: text/plain; charset="UTF-8" archived-at: Sat, 30 Sep 2017 18:59:18 -0000 > Preface: I dont know how automaton is implemented deeply inside lucene , Well, you can take a look, it's open source. :) There are two different finite state automata inside Lucene: one is pretty much a "read-only" transducer from unique input seqences (of bytes) into an output. This is the FST class. The other is Automaton class which has been ported from the Brics library [1]. I can't really relate to your comment about fast querying for sub-automata; sounds interesting though. Dig in the code and suggest a patch (or even demonstrate what you came up with!). Dawid [1] http://www.brics.dk/automaton/ > but (considering automaton is built on the fly when index is already > present) i imagine that the automaton is scanning the lexicons/tokens > present in the lucene index for finding the document references (solution > 1). > I think there are 2 different generic solutions for using automata for my > opinion. > 1) to create a automaton for parsing the token present in the lucene table > as described above. > 2) to create a pattern matching automaton(on binary, or better of a > abstract stream could be more generic) and put these states directly in a > index . In this case you can receive very fastly the documents matching a > specific automaton built when you created the index ( or a sub-automaton > rappreenting a subset of the same states) . The second solution could > maybe be used for mapping inside a single lucene document field a complex > structure and then you can find nested information embedded . In this way > i need not to use multiple lucene documents (this could create performance > and scalability problems) > In many cases this solution could be fastest of actual joins for example, > be usefull in bioinformatic or all those cases where data is not a basic > ADT. > > Cristian > > 2017-09-30 12:24 GMT+02:00 Dawid Weiss : > >> > Hi , it is possible to create a Automaton in lucene parsing not a string >> > but a byte array? >> >> Can you state what problem are you trying to solve? This seems to be a >> question stripped of a more general context -- why do you need those >> byte-based automata? >> >> Dawid >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org