Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E6BA7200B9C for ; Mon, 10 Oct 2016 17:02:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E52F8160AE1; Mon, 10 Oct 2016 15:02:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C48B0160ACA for ; Mon, 10 Oct 2016 17:02:06 +0200 (CEST) Received: (qmail 33320 invoked by uid 500); 10 Oct 2016 15:02:05 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 33310 invoked by uid 99); 10 Oct 2016 15:02:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2016 15:02:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 198061804B5 for ; Mon, 10 Oct 2016 15:02:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.929 X-Spam-Level: * X-Spam-Status: No, score=1.929 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id AGsGnaWXeHWU for ; Mon, 10 Oct 2016 15:02:04 +0000 (UTC) Received: from mail-qt0-f172.google.com (mail-qt0-f172.google.com [209.85.216.172]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B8CD35F39A for ; Mon, 10 Oct 2016 15:02:03 +0000 (UTC) Received: by mail-qt0-f172.google.com with SMTP id f6so57871262qtd.2 for ; Mon, 10 Oct 2016 08:02:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ENX/z2THYpgsSTSqJ7t28sIgyTjhAMVZL+uLn60+gzw=; b=MIaDl0aAszb/GpGCRwK7CQK7dzNDT1UFcmWAW7DY9VFei/P55dejCrAEgon3CBzHxh 2ukfFgNWL8RljLluOONCXmqqSUgbvoteeqAvhCAy6h7pBx62/YZpPJUgaVS5zloGBvEm WDhgCQUkiLNU/rJV44SNBDT9spJU1FfsfnqLKdZXYpfuDkzoAsRNr0r6zOTyauuqitTV vuqpzvpxfUZivZHygvnwweDK+uesD4RRtJYp8OEsb1zjUnWW2AnDKmt/rs2QKaIyvt3X BZAed1bEPZtZSN9yzNhksqbC4q5oQQpeDHfHmdrhy1s9/wg4LTo/Dw+10iuVLYtAkuOi PyEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ENX/z2THYpgsSTSqJ7t28sIgyTjhAMVZL+uLn60+gzw=; b=HGjWPzYNtlAUFAy/qrM31z/iwX2Su2g3ye2g48Cz6fKz7D86xmxTQMFiWe3DP5ba35 lF+zuAfGVpEkBlexOzclh82kQI0ZOSqeZHDtc4XPw+tDd3o1ItL9+JS7KDD5iQzB0i2i T2AxYZicFwyE59Or8RYLrQfkoOMeJ42K7xbzL5JCr5AQWU6b7kAZswfSleSQMC3p4kJM iQ8no2EB9IuKyragNmJZ6pq7Y3yTB1Kct1ViIchcmuTnwf+qXZrUmIJf5UM3smkIu3S9 tUlkFynMV6KYFm/t1uEd8LZaPsV3AO92plR+1BFaBkNTgJucnkdbRbl+2RkKGFG+SUpB Fl8A== X-Gm-Message-State: AA6/9RkG3AVTcV6CW2FtL0Qy0a4dbAlxBS7pdEfS3IFR4kFVjLMZL9Ln1sQeg4cD+jEl2L/KdHPed0JS7NFy8A== X-Received: by 10.195.17.226 with SMTP id gh2mr34230201wjd.15.1476111720035; Mon, 10 Oct 2016 08:02:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.178.194 with HTTP; Mon, 10 Oct 2016 08:01:19 -0700 (PDT) In-Reply-To: References: <57ECA7B6.2000900@gmail.com> From: Yamini Joshi Date: Mon, 10 Oct 2016 10:01:19 -0500 Message-ID: Subject: Re: Indexing Column Values in Accumulo To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a11c3bc62b8bb3e053e840a7f archived-at: Mon, 10 Oct 2016 15:02:08 -0000 --001a11c3bc62b8bb3e053e840a7f Content-Type: text/plain; charset=UTF-8 I guess there is no other way. Also, once I get the rowIDs, I need to do further filtering. Do the filters parse an entire record? My use case is to select rowIds with a cf|cq value (given a list of values(cqs)). In other words, the filter will have to access all the cf|cqs, right? Best regards, Yamini Joshi On Mon, Oct 10, 2016 at 5:09 AM, vaibhav thapliyal < vaibhav.thapliyal.91@gmail.com> wrote: > Creating an Inverted Index could serve your use case. You can store the > column family and column qualifier both in the row of the index table > separated by a delimiter. > > For eg cf|cq > > And then perform queries on just the row id to get a low query time. > > On 29 September 2016 at 11:03, Josh Elser wrote: > >> Hi Yamini, >> >> You're right that a filter would have to exhaustively search a table to >> find all rows that contain a certain family and qualifier. If you >> explicitly know the rows that you want to search, this is a fast operation. >> >> Have you considered creating an inverted index? This would be a table >> that you have to maintain on your own. Accumulo does not provide automatic >> index generation. >> >> - Josh >> >> >> Yamini Joshi wrote: >> >>> Hello everyone >>> >>> Is there a way to easily index column fields for efficient lookups in >>> Accumulo? My use case is to select the records containing a certain >>> column family and column qualifier from among a set of column >>> qualifiers(reverse lookup). Although this could be done using a custom >>> filter, I'm looking for an optimal solution (since filter might scan the >>> entire database). >>> >>> Best regards, >>> Yamini Joshi >>> >> > --001a11c3bc62b8bb3e053e840a7f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I guess there is no other way. Also, once I get the rowIDs= , I need to do further filtering. Do the filters parse an entire record? My= use case is to select rowIds with a cf|cq value (given a list of values(cq= s)). In other words, the filter will have to access all the cf|cqs, right?<= br>

Bes= t regards,
Yamini Joshi

On Mon, Oct 10, 2016 at 5:09 AM, vaibhav tha= pliyal <vaibhav.thapliyal.91@gmail.com> wrote:<= br>
Creating an In= verted Index could serve your use case. You can store the column family and= column qualifier both in the row of the index table separated by a delimit= er.

For eg cf|cq

And then perform queries on just= the row id to get a low query time.

On 29= September 2016 at 11:03, Josh Elser <josh.elser@gmail.com> wrote:
Hi Yamini,

You're right that a filter would have to exhaustively search a table to= find all rows that contain a certain family and qualifier. If you explicit= ly know the rows that you want to search, this is a fast operation.

Have you considered creating an inverted index? This would be a table that = you have to maintain on your own. Accumulo does not provide automatic index= generation.

- Josh


Yamini Joshi wrote:
Hello everyone

Is there a way to easily index column fields for efficient lookups in
Accumulo? My use case is to select the records containing a certain
column family and column qualifier from among a set of column
qualifiers(reverse lookup). Although this could be done using a custom
filter, I'm looking for an optimal solution (since filter might scan th= e
entire database).

Best regards,
Yamini Joshi


--001a11c3bc62b8bb3e053e840a7f--