incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Daughtery <jdaught...@t-sciences.com>
Subject Re: Scan for keyword
Date Wed, 23 Nov 2011 20:37:07 GMT
Aaron
Thanks.  I have combed the examples for an explanation on how to do this.

Let me know if it would be useful for feedback from a new user.  There are
some things that I would like to see as a new user attempting to learn the
api.

I will work through your example and see how it goes.

Thanks

Joe Daughtery

On Wed, Nov 23, 2011 at 3:08 PM, Aaron Cordova <aaron@cordovas.org> wrote:

> Joe,
>
> What you're talking about is pretty common. In fact, it's so common there
> should probably be an example included in the Acccumulo-examples project
> for it. To do it requires building another table as a secondary index, as
> Jason mentioned.  Accumulo doesn't have any special structures just for
> indexes, it's just another table. Here's how you might go about it:
>
> Assuming using some unique identifier for you row IDs, your table might
> look something like this:
>
> rowID col fam col qual value
> 000 displayname joey
> 000 login jd
> 000 name joe
> 001 displayname jd
> 001 login joe
> 001 name joey
>
>
> I would just leave the col qual blank. Then you could build a second table
> as an index that looks like this:
>
> rowID col fam col qual value
> jd displayname 001
> jd login 000
> joe login 001
> joe name 000
> joey displayname 000
> joey name 001
>
>
> To build this table, you can simply insert the inverted Mutations into the
> index table at the same time you're inserting records into your first table.
>
> To query for records in which "joe" appears in any field, you simply scan
> the entire row identified by "joe" in the index and get all the fields in
> all records where "joe" appears, thus:
>
> scanner.setRange(new Range("joe"));
>
> To get records where "joe" appears in a specific field, say the name
> field, alter your scan to include a more specific range:
>
>
> s.setRange(new Range(new Key(new Text("joe"), new Text("name"), new Text("")), new Key(new
Text("joe"), new Text("name\0"), new Text(""))));
>
>
> That range spans joe name to joe name\0, which includes all column
> qualifiers up to the next column family.
>
> You can then pull out the column qualifiers from the index to get the
> rowIDs.
>
> If you want to lookup values from each of those rows, you could then put
> them in a List and pass them to a BatchScanner. There is code for this in
> the Indexing subsection of the Table Design section of the manual:
>
> Text term = new Text("mySearchTerm");
>
> HashSet<Text> matchingRows = new HashSet<Text>();
>
> Scanner indexScanner = createScanner("index", auths);
> indexScanner.setRange(new Range(term, term));
>
> // we retrieve the matching rowIDs and create a set of ranges
> for(Entry<Key,Value> entry : indexScanner)
> matchingRows.add(new Text(entry.getValue()));
>
> // now we pass the set of rowIDs to the batch scanner to retrieve them
> BatchScanner bscan = conn.createBatchScanner("table", auths, 10);
>
> bscan.setRanges(matchingRows);
> bscan.fetchFamily("attributes");
>
> for(Entry<Key,Value> entry : scan)
>
> System.out.println(e.getValue());
>
>
>
> This whole process is more complicated than I'd like it to be, but it
> works pretty well and people have built huge tables and indexes this way.
> You can get very fancy with what and how you choose to index.
>
> Let us know how this goes for you.
>
> Aaron
>
>
> On Nov 23, 2011, at 2:35 PM, Joey Daughtery wrote:
>
> Aaron
> Thanks for the reply.  I was only able to get data into Accumulo after
> reviewing the page you provided.
>
> Lets say for example that I am storing a Name, login, displayName columns
> as the column family.  And I have inserted Joe, jd, joey as one record and
> joey, joe, jd for the second record.
>
> mut.put(new Text("Name"), new Text("joe"), cv, new Value("joe");
> mut.put(new Text("login"), new Text("jd"), cv, new Value("jd");
> mut.put(new Text("DisplayName"), new Text("joey"), cv, new Value("joey");
> write(...)
>
> mut.put(new Text("Name"), new Text("joey"), cv, new Value("joey");
> mut.put(new Text("login"), new Text("joe"), cv, new Value("joe");
> mut.put(new Text("DisplayName"), new Text("jd"), cv, new Value("jd");
> write(...)
>
> How would I execute a keyword search for "joe" in an attempt to pull back
> both records where Joe is the value for Login for one record while "joe" is
> a value for Name in another?
>
> The example in the Table Design page shows the search based on the row
> id.  From my understanding if I provide the rowId, it will limit the search
> to that row.  But the example on that page is essentially just loading a
> specific row based on a rowid, not a keyword search.
>
> Thanks for the reply.  I hope my explanation of what I am attempting to do
> is making sense.
>
> Joe
>
> On Wed, Nov 23, 2011 at 1:55 PM, Aaron Cordova <aaron@cordovas.org> wrote:
>
>> Joe,
>>
>> If you haven't already, check out the Table Design section of the Manual
>>
>>
>> http://incubator.apache.org/accumulo/user_manual_1.3-incubating/Table_Design.html
>>
>> specifically, the subsection titled 'Indexing'. If you have read this,
>> let us know and we can clarify.
>>
>> Aaron
>>
>>
>> On Nov 23, 2011, at 1:46 PM, Jason Rutherglen wrote:
>>
>> > The most efficient system would be to implement a secondary [inverted]
>> > index on the Accumulo data.
>> >
>> > May there is a Coprocessor like API that would allow this type of
>> > functionality to be implemented?
>> >
>> > On Wed, Nov 23, 2011 at 1:12 PM, Joey Daughtery
>> > <jdaughtery@t-sciences.com> wrote:
>> >> All
>> >> I am new to Accumulo.  I have figured out how to store the data, load
>> all
>> >> based on scanning with new Range(), and loading a specific row based
>> on new
>> >> Range(id).  However, if I want to locate a row that has a specific
>> value, I
>> >> am not sure how to approach this programmatically.  Can someone give
>> me some
>> >> insight on how to do such a scan?
>> >>
>> >> Also, I have seen several examples of how to populate the Mutation
>> object.
>> >> Specifically, I see:
>> >> mut.put(new Text("column"), new Text("NAME"), timestamp, new
>> Value("John");
>> >>
>> >> OR
>> >> mut.put(new Text("NAME"), new Text("John"), timestamp, new
>> Value("John);
>> >>
>> >> Could someone indicate which is the correct way to store the data or
>> >> indicate why one would use one approach over the other?
>> >>
>> >> Thanks
>> >>
>> >> Joe
>>
>>
>
>

Mime
View raw message