Edit 12/12/2019. Attached find a zip-file with the full program code of a spreadsheet-manipulating Ruby-library in development and two executable files as technology-demonstrators. Documentation is in the sub-directory doc. For the executables to run and the library to be functional, three Ruby-gems must be installed beforehand: Nokogiri, Ruby-Filemagic (Not just Filemagic, it is a different gem) and Rubyzip. You avoid all this trouble by using the gem utility right away and install my own gem with :~$ gem install spreadsheet_manip. The code which is at the origin of my worries and this very post, is in lib/sheet.rb
Good morning.
This concerns PMDX- and XSLX files and demands notions of their structure and content. And although this and the topic should suffice, here is more and completely useless background-information: sharedStrings.xml and workbook.xml are components of any OOXML spreadsheet. If this surprises you, ignore this post. TY.
When comparing textual cell-content to given values or while applying expressions ( "contains ", " is equal to" and the like), I have the choice to lookup the String-value from a spreadsheet cell or, where applicable, to derive the index (instead of the string itself) from the sharedStrings.xml file.
Are there any risks and should I give precedence to one or the other, when using "equal to" and "not equal to" with String-values ?
Background: To facilitate some frequent tasks, I have code which manipulates OOXML spreadsheets. Recently, I added "delete_rows_by_expression" where I have to compare cell values. I can look them up in SharedStrings, but do not have to, when they are numerical values. In consequence, and where "equal" and "unequal" are used, I only get the numerical index of a string, then do as if there were no string.
The question put another way: Is there any risk to do so?
Background 2: I find the conventions for the file-structure, notably the way that sheets are listed in workbook.xml, but other files not, quite frightening. Having yet another file "sharedStrings.xml" to manage, I feel like hovering in a vacuum, always hoping that those references, disjoint from any file, might, in the end, however point to the authentic content of a file... It seems to work, until now, but I tend to say: This works by chance.
The fact that the index for a shared string, which is a simple integer, must be *derived* from a file, which may itself be subject to... what do I know.., instead of having something like a map, a vector or an array of some kind, is hallucinating.
Provided I got anything right, at all.
I do not yet know, how OpenDocument does all that, but sense plenty of ways to render the file-format and relations between XML-files less frightening.
Q: [XML Scripting] sharedStrings indices instead of content
- Michael Uplawski
- Posts: 179
- Joined: Thu Dec 11, 2014 11:43 pm
- Location: Canton Magny (previously Canton Carrouges), Orne, Normandy (previously Lower Normandy)
Q: [XML Scripting] sharedStrings indices instead of content
- Attachments
-
- spreadsheet_manip.zip
- (530.1 KiB) Downloaded 1127 times
Last edited by Michael Uplawski on Fri Nov 15, 2024 6:38 pm, edited 1 time in total.
“Hindsight is in the eye of the beholder.”
- Michael Uplawski
- Posts: 179
- Joined: Thu Dec 11, 2014 11:43 pm
- Location: Canton Magny (previously Canton Carrouges), Orne, Normandy (previously Lower Normandy)
ANSWER: [XML Scripting] sharedStrings indices instead of content
Answer:
Comparing cell-content to given values is better done by extracting strings from sharedStrings.xml first. Number formats are recognized by the 't' attribute (probably for 'type') of a cell-tag. What confuses me is the fact that there is not always a 't' attribute. I will check this with the OOXML-standard.
Now I need to re-write my routines to make them less error-prone, especially where strings are part (and then the result) of a formula-expression, like in ="Sum is " & sum(A1:A12). These strings appear to be recognizable by the 't' attribute and its value "str", as they cannot be found in sharedStrings.xml.
Comparing cell-content to given values is better done by extracting strings from sharedStrings.xml first. Number formats are recognized by the 't' attribute (probably for 'type') of a cell-tag. What confuses me is the fact that there is not always a 't' attribute. I will check this with the OOXML-standard.
Now I need to re-write my routines to make them less error-prone, especially where strings are part (and then the result) of a formula-expression, like in ="Sum is " & sum(A1:A12). These strings appear to be recognizable by the 't' attribute and its value "str", as they cannot be found in sharedStrings.xml.
“Hindsight is in the eye of the beholder.”