Hello
Is there a way to search for specific strings of text inside Textmaker files in my home directory?
I can save them as .rtf files, and do a search with grep, but it can't "see" into .tmdx files.
Searching for text inside TMDX files
Re: Searching for text inside TMDX files
TMDX and DOCX files are zip files, so if you find a way to search inside zip files in linux, that will work.
I found this site that may be a good starting point:
https://unix.stackexchange.com/question ... -zip-files
I found this site that may be a good starting point:
https://unix.stackexchange.com/question ... -zip-files
- Michael Uplawski
- Posts: 179
- Joined: Thu Dec 11, 2014 11:43 pm
- Location: Canton Magny (previously Canton Carrouges), Orne, Normandy (previously Lower Normandy)
Re: Searching for text inside TMDX files
Miguel got it entirely right.., then withdrew from the affair, leaving you in an awful mess to sort out on your own...
I feel with both of you.
So, as could be derived from the referenced discussion, you have to unzip the tmdx-file. Do this in a folder that you can remove afterwards, or directly in the /tmp directory of your system.
Unfortunately, though, the unzipped tmdx-file contains sub-directories and the text of your document is only found in the file word/document.xml. You should write a shell-script which takes this into account.
Not enough.
TextMaker compresses the document.xml (and other xml-files) in the way, that line-breaks are removed. To read and interpret XML, those are not necessary and would just bloat up the file (Side-note: In OTF, they would be mortal as TextMaker mis-interprets them). Ergo, you have to beautify the xml-code before you can search it. If you skip this step, every key-word searched for will result in the entire XML-structure being returned, as your document.xml is a one-liner !!!
Good luck.
I am sarcastic, if that escaped you.
My recommendation would be, to use, in fact, an xml-parser on document.xml. Explaining the use of such software is off-topic in this discussion, the result will satisfy the OP, however.
I feel with both of you.
So, as could be derived from the referenced discussion, you have to unzip the tmdx-file. Do this in a folder that you can remove afterwards, or directly in the /tmp directory of your system.
Unfortunately, though, the unzipped tmdx-file contains sub-directories and the text of your document is only found in the file word/document.xml. You should write a shell-script which takes this into account.
Not enough.
TextMaker compresses the document.xml (and other xml-files) in the way, that line-breaks are removed. To read and interpret XML, those are not necessary and would just bloat up the file (Side-note: In OTF, they would be mortal as TextMaker mis-interprets them). Ergo, you have to beautify the xml-code before you can search it. If you skip this step, every key-word searched for will result in the entire XML-structure being returned, as your document.xml is a one-liner !!!
Good luck.
I am sarcastic, if that escaped you.
My recommendation would be, to use, in fact, an xml-parser on document.xml. Explaining the use of such software is off-topic in this discussion, the result will satisfy the OP, however.
“Hindsight is in the eye of the beholder.”
-
- Posts: 55
- Joined: Fri Jun 08, 2018 9:19 pm
Re: Searching for text inside TMDX files
And not that knowledgeable about Linux, it would appear. Fortunately, miguel-c gave me a good starting point.
The answer is to use zgrep from zutils. It is not the same zgrep that comes by default in Debian, which is just a shell script and can't do recursive searches through files in a directory. I can search by typing this into a terminal:
Code: Select all
zgrep 'search term' -r /path/to/documents
Last edited by colonel_panic on Fri Oct 18, 2019 1:27 pm, edited 1 time in total.
- Michael Uplawski
- Posts: 179
- Joined: Thu Dec 11, 2014 11:43 pm
- Location: Canton Magny (previously Canton Carrouges), Orne, Normandy (previously Lower Normandy)
Re: Searching for text inside TMDX files
zgrep is not Linux, it is zgrep.
“Hindsight is in the eye of the beholder.”
Re: Searching for text inside TMDX files
Very nice, thank you for sharing your solution!colonel_panic wrote: ↑Tue Oct 08, 2019 10:16 amAnd not that knowledgeable about Linux, it would appear. Fortunately, miguel-c gave me a good starting point.
The answer is to use zgrep from zutils. It is not the same zgrep that comes by default in Debian, which is just a shell script and can't do recursive searches through files in a directory. I can search by typing this into a terminal:Code: Select all
zgrep 'search term' -r --format=gz /path/to/documents
-
- Posts: 55
- Joined: Fri Jun 08, 2018 9:19 pm
Re: Searching for text inside TMDX files
I put a little typo into my example above. Please find it corrected below:
Code: Select all
zgrep 'search term' -r /path/to/documents