accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: XML Storage - Accumulo or HDFS
Date Thu, 07 Jun 2012 11:29:58 GMT
On Wed, Jun 6, 2012 at 10:50 PM, Josh Elser <josh.elser@gmail.com> wrote:
>  Aside from losing the hierarchy
> knowledge, if you have a skewed distribution of elements in the XML
> document, you can't get good locality in your query/analytic. What was your
> idea behind storing the offsets?

<RECORDS>
 <RECORD>
  <KEY_FIELD/>
  <TAG/>
 </RECORD>
 <RECORD>
  <KEY_FIELD/>
  <TAG/>
 </RECORD>
</RECORDS>

My XML looks like that. I don't know how the information in the XML
will be used in the future and I don't want to re-scan large numbers
of XML to find a single record. For example, yesterday we found a
potential bug. My bug analysis showed the source data was in record X
of 450,000 records. Since I know which XML file held that record, I
was able to get that file locally and use command-line tools to find
surrounding information. My XML file might have 200 tags but normally
I only need 45 of them. My XML is without hierarchy.

Mime
View raw message