Q: [XML Scripting] sharedStrings indices instead of content

Post Reply
User avatar
Michael Uplawski
Posts: 179
Joined: Thu Dec 11, 2014 11:43 pm
Location: Canton Magny (previously Canton Carrouges), Orne, Normandy (previously Lower Normandy)

Q: [XML Scripting] sharedStrings indices instead of content

Post by Michael Uplawski »

Edit 12/12/2019. Attached find a zip-file with the full program code of a spreadsheet-manipulating Ruby-library in development and two executable files as technology-demonstrators. Documentation is in the sub-directory doc. For the executables to run and the library to be functional, three Ruby-gems must be installed beforehand: Nokogiri, Ruby-Filemagic (Not just Filemagic, it is a different gem) and Rubyzip. You avoid all this trouble by using the gem utility right away and install my own gem with :~$ gem install spreadsheet_manip. The code which is at the origin of my worries and this very post, is in lib/sheet.rb

Good morning.

This concerns PMDX- and XSLX files and demands notions of their structure and content. And although this and the topic should suffice, here is more and completely useless background-information: sharedStrings.xml and workbook.xml are components of any OOXML spreadsheet. If this surprises you, ignore this post. TY.

When comparing textual cell-content to given values or while applying expressions ( "contains ", " is equal to" and the like), I have the choice to lookup the String-value from a spreadsheet cell or, where applicable, to derive the index (instead of the string itself) from the sharedStrings.xml file.

Are there any risks and should I give precedence to one or the other, when using "equal to" and "not equal to" with String-values ?

Background: To facilitate some frequent tasks, I have code which manipulates OOXML spreadsheets. Recently, I added "delete_rows_by_expression" where I have to compare cell values. I can look them up in SharedStrings, but do not have to, when they are numerical values. In consequence, and where "equal" and "unequal" are used, I only get the numerical index of a string, then do as if there were no string.

The question put another way: Is there any risk to do so?

Background 2: I find the conventions for the file-structure, notably the way that sheets are listed in workbook.xml, but other files not, quite frightening. Having yet another file "sharedStrings.xml" to manage, I feel like hovering in a vacuum, always hoping that those references, disjoint from any file, might, in the end, however point to the authentic content of a file... It seems to work, until now, but I tend to say: This works by chance.
The fact that the index for a shared string, which is a simple integer, must be *derived* from a file, which may itself be subject to... what do I know.., instead of having something like a map, a vector or an array of some kind, is hallucinating.

Provided I got anything right, at all.

I do not yet know, how OpenDocument does all that, but sense plenty of ways to render the file-format and relations between XML-files less frightening.
Attachments
spreadsheet_manip.zip
(530.1 KiB) Downloaded 1127 times
Last edited by Michael Uplawski on Fri Nov 15, 2024 6:38 pm, edited 1 time in total.
Hindsight is in the eye of the beholder.
User avatar
Michael Uplawski
Posts: 179
Joined: Thu Dec 11, 2014 11:43 pm
Location: Canton Magny (previously Canton Carrouges), Orne, Normandy (previously Lower Normandy)

ANSWER: [XML Scripting] sharedStrings indices instead of content

Post by Michael Uplawski »

Answer:

Comparing cell-content to given values is better done by extracting strings from sharedStrings.xml first. Number formats are recognized by the 't' attribute (probably for 'type') of a cell-tag. What confuses me is the fact that there is not always a 't' attribute. I will check this with the OOXML-standard.

Now I need to re-write my routines to make them less error-prone, especially where strings are part (and then the result) of a formula-expression, like in ="Sum is " & sum(A1:A12). These strings appear to be recognizable by the 't' attribute and its value "str", as they cannot be found in sharedStrings.xml.
Hindsight is in the eye of the beholder.
Post Reply

Return to “PlanMaker 2018 for Linux”