poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Beardsley <markbrd...@tiscali.co.uk>
Subject Re: How to check if Excel values are matching with my format
Date Mon, 14 Feb 2011 16:15:59 GMT

The question is not how many rows there are on the sheet but how many rows
are there where the occurrences of the cells differ?

If every row on the sheet follows this pattern - String, Number, Number,
String, String - then all you need to do is define that pattern once and
check each row against it. On the other hand, if the pattern of each row is
different, then yes, you will need to define a pattern for each row and
compare each row to the appropriate pattern. Should this be the case, then I
would look to externalise the pattern so that you do not have to change the
code. Are you familiar with properties files? It would be possible to create
entries in a properties file that decsribe the pattern for each row. Read
this file when the application starts, use it's contents to create the
pattern and away you go. If you do not like the idea of a properties file,
then it would be an excellant application of XML. Markup could be created to
define the pattern for each row and using XML would likely make everything
much easier to understand for whoever comes later and has to maintain your
code. XML is a little more complex to parse - there is language level
support for properties files of course but you would have to write a class
to parse the XML and convert the markup into patterns.

The worst of all worlds would be if the pattern were random. Determining
which pattern to use to validate which row could be quite a challenge.


Mark B

PS I did have another thought about how you could validate the row against a
pattern. Each pattern could be converted into a number as could each row as
you read it; then the row number could be compared to the pattern number. It
could work a little like this.

StringBuffer buffer = new StringBuffer();
buffer.append(String.valueOf(Cell.CELL_TYPE_STRING));   // 1
buffer.append(String.valueOf(Cell.CELL_TYPE_FORMULA));   // 2
buffer.append(String.valueOf(Cell.CELL_TYPE_STRING));   // 1
buffer.append(String.valueOf(Cell.CELL_TYPE_STRING));   // 1
buffer.append(String.valueOf(Cell.CELL_TYPE_NUMERIC));   // 0
buffer.append(String.valueOf(Cell.CELL_TYPE_NUMERIC));   // 0
buffer.append(String.valueOf(Cell.CELL_TYPE_STRING));   // 1
long rowPattern = Long.parseLong(buffer.toString().trim());

This should result in a long value of 1211001 that represents the pattern
for that particular row. In like manner, you could iterate through the cells
on a row, get the type of each cell and assemble this into a StringBuffer.
At the end of the row, convert the buffers contents into a long value and
simply perform a comparison. If the values are the same then the pattern on
the row should be the same. It would be possible to keep a series of these
patterns as simple elements in an array of type long, significantly reducing
the amount of memory required. Lastly, if you do have to deal with dates,
then simply assign that type a value that you decide - 9 maybe, I think it
should be kept to a single digit or else the whole approach fails. One other
advantage of this approach is that you could simply define a pattern in the
properties file by that string of numbers, so;


then read that from the file, convert it into a long value and away you go.

If you are unsure about how to check for a date, the idiom goes something
like this;

cell - row.getCell(columnNumber);

if(cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
   if(DateUtil.isDateCellFormatted(cell)) {
      // You know you have a cell with a date in it
   else {
      // You know you have date with a number in it.

Obviously you would also check for all of the other types and most likely
use a switch statement to do so but this gives you tha basic idea.

If you need to distinguish between integer and decimal values then you will
either need to use regular expressions or simply search for the decimal
separator in the value you get out of the cell. Excel does not distinguish
between the two types, it stores all numbers as floating point values and
applies a format to make the cell's contents appear as an interger or
decimal value. Again, you would need to define a type value for either the
integer or floating point value as the predefined Cell.CELL_TYPE_NUMERIC
would only cover one or the other. The filnal advantage of this method is
that it would be both quick and easy to compare the pattern for a single row
to each of the predefined patterns; it's as quick as saying, for example
if(1211001 == 1211002).
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-check-if-Excel-values-are-matching-with-my-format-tp3379567p3384674.html
Sent from the POI - User mailing list archive at Nabble.com.

To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

View raw message