accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Problem with IntersectingIterator and IndexDocIterator not returning results.
Date Thu, 26 May 2016 15:57:14 GMT
Hi David,

Generally, I think you're confusing the type of table with what these 
Iterators are meant to run over.

Remember, "shard" or "sharded" refers to distributing some amount of 
data across many servers by some hash partitioning (commonly, at least). 
This involves setting some "salt" or bit in the rowId to distribute your 
records across many servers instead of on a single server.

The table you describe is what's referred to as an "inverted index". The 
term is the primary sort order. This makes it very quick to find all 
pointers to "documents" which contain the given term.

The iterators you're trying to use as designed to operate over what's 
referred to as a "local index". In this form, the index records are 
co-located with the data records in a separate column. So, for each 
rowId, one column (family) is devoted to storing index records, while 
another is devote to storing the actual data records. This structure is 
what the iterators are designed to work over. These iterators are novel 
because of some of the assumptions they can make on the physical data 
model of Accumulo tables, but let's ignore that for now :)

I know this isn't super helpful to you as-is. I'll see if I can find any 
time to make a better write-up for you.

Finally, as far as the iterator javadocs not being published was an 
intentional change, but one I believe we should revert. <-- **ping 
Chistopher**

- Josh

David Boyd wrote:
> All:
>
>     I am using accumulo 1.6.1.  I am using a sharded index to search for
> data
> that matches the values of certain fields.  Here is the situation:
>
> I have four fields EntityId, EntityIdType, EntityName, EntitySource.
>
> Sometimes I need all records which match EntityId and EntityIdType
> Othertimes I need all records which match all four fields.
>
> The plan was use uses ranges in the scanner to determine which fields to
> match against.  I have tried both subsets of ranges, setting all ranges, and
> gotten the same result.
>
> I created an Index as follows:
> RowId = fieldname
> ColumnFamily = fieldvalue
> ColumnQualifier = the overall record id (RowID) of my main record in
> another table.
>
> Here is the output of a scan of my index table:
>
>> entityid
>> 1707945d-34d8-455d-85b1-55610739ce62:1707945d-34d8-455d-85b1-55610739ce62
>> []
>> entityidtype GUID:1707945d-34d8-455d-85b1-55610739ce62 []
>> name TestEntity:1707945d-34d8-455d-85b1-55610739ce62 []
>> source Unit Test:1707945d-34d8-455d-85b1-55610739ce62 []
>
> NOTE:  While in this case the entityid equals the overall RowID from the
> other table that is not always true
>
> When I run the code below it does not return any rows in the scanner.
> In the debugger when running the code below terms show as follows:
> [1707945d-34d8-455d-85b1-55610739ce62, GUID, TestEntity, Unit Test]
>
> I have tried both IntersectingIterator and IndexDocIterator both have
> the same results.
> For whatever reason the API docs for these classes is not showing up on
> the Apache
> Accumulo site.
>
> Am I missing the purpose/function of this iterator?
>
> Do I have to call IndexedDocIterator.setColfs with some values so I get
> the column qualifiers back?
>
> Below is my code:
>
> public List<String> getCoalesceEntityKeysForEntityId(String entityId,
>                                                           String
> entityIdType,
>                                                           String entityName,
>                                                           String
> entitySource) throws CoalescePersistorException
>      {
>          // Use are sharded term index to find the merged keys
>          Connector dbConnector = null;
>
>          ArrayList<String> keys = new ArrayList<String>();
>
>          Text[] terms = {new Text(entityId), new Text(entityIdType),
>                  new Text(entityName), new Text(entitySource)};
>
>
>          try {
>              dbConnector = AccumuloDataConnector.getDBConnector();
>
>              BatchScanner keyscanner =
> dbConnector.createBatchScanner(AccumuloDataConnector.coalesceEntityIndex, Authorizations.EMPTY,
> 4);
>
>              // Set up an IntersectingIterator for the values
>              IteratorSetting iter = new IteratorSetting(1, "intersect",
> IndexedDocIterator.class);
>              IndexedDocIterator.setColumnFamilies(iter,terms);
>              keyscanner.addScanIterator(iter);
>
>              // Use ranges to limit the bins searched
>              //ArrayList<Range> ranges = new ArrayList<Range>();
>              // May not be necessary to restrict ranges but will do it
> to be safe
>              //ranges.add(new Range("entityid"));
>              //ranges.add(new Range("entityitype"));
>              //ranges.add(new Range("entityname"));
>             // ranges.add(new Range("source"));
>              //keyscanner.setRanges(ranges);
>              keyscanner.setRanges(Collections.singleton(new Range()));
>
>              // Return the list of keys
>              for(Entry<Key,Value> entry : keyscanner) {
>                  keys.add(entry.getKey().getColumnQualifier().toString());
>              }
>
>          } catch (TableNotFoundException ex) {
>              System.err.println(ex.getLocalizedMessage());
>              return null;
>          }
>
>          return keys;
>      }
>
>
>
> --
> =========mailto:dboyd@incadencecorp.com  ============
> David W. Boyd
> VP,  Data Solutions
> 10432 Balls Ford, Suite 240
> Manassas, VA 20109
> office:   +1-703-552-2862
> cell:     +1-703-402-7908
> ==============http://www.incadencecorp.com/  ============
> ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
> Chair ANSI/INCITS TC Big Data
> Co-chair NIST Big Data Public Working Group Reference Architecture
> First Robotic Mentor - FRC, FTC -www.iliterobotics.org
> Board Member- USSTEM Foundation -www.usstem.org
>
> The information contained in this message may be privileged
> and/or confidential and protected from disclosure.
> If the reader of this message is not the intended recipient
> or an employee or agent responsible for delivering this message
> to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication
> is strictly prohibited.  If you have received this communication
> in error, please notify the sender immediately by replying to
> this message and deleting the material from any computer.
>
>
>

Mime
View raw message