| Server IP : 107.13.46.68 / Your IP : 216.73.216.11 Web Server : Apache/2.4.58 (Ubuntu) System : Linux mariOS 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec 5 13:09:44 UTC 2024 x86_64 User : www-data ( 33) PHP Version : 8.3.6 Disable Function : NONE MySQL : OFF | cURL : ON | WGET : ON | Perl : ON | Python : OFF | Sudo : ON | Pkexec : ON Directory : /usr/bin/ |
Upload File : |
#!/usr/bin/env bash # Requires unzip package # ================================== # 1. Get a list of all xhtml/html/htm files, exclude titlepage.xhtml (if present) # It appears that the zipped files _never_ contain problem characters such as spaces... # 2. Extract the html files and convert to text (UTF-8 output is available). # ================================== # 1. Get a list of xhtml/html/htm files [using unzip's weird regular expression] - and exclude any named titlepage/toc/copyright files=$(unzip -Z1 "$1" \*.*htm* | egrep -v 'titlepage.*|toc.*|copyright.*') # 2. Uncompress each of the files and process with html2text. unzip -cqq "$1" $files | html2text -o -