XLSX with Common Lisp Part 3
link
対応策3 :xlsx
パッケージのダウンロード元に
xlsx common-lisp
のような検索ワードで検索すると、https://gitlab.common-lisp.net/cungil/xlsx がみつかる。なるほど、これらしい。
このリポジトリ、プロジェクトをみていると、かなり古い。作成日が 2015 年になっており、更新日は7 年前から10 年前になっている。
これは、ここに書いても応答がないだろう。でも、一応、ここまできたので、今までの内容を短く Issue #3 にまとめておいた。
Issue #3 UTF-8 support to use your language character in XLSX package
The xlsx is very nice tool. But some special case, it may not treat some string in excel, if it is not ASCII characters.
why ?
octets-to-string without :external-format assumes that the encoding is latin-1. And you can treat a Latin-1 character as UTF-8, you know. Because latin-1 data in excel are safe to show. But non Latin-1 character is not.
The encoding of Excel XML files generated by Excel-2013, LibreOffice, and Google Stylesheet is UTF-8 without BOM. Furthermore, according to a section "6.2.5 XML usage" Part 2 of ECMA-376 / ISO/IEC 29500 (Office Open specification), the encoding of Excel XML files is limited to UTF-8 with BOM, UTF-8 without BOM, UTF-16 LE or UTF-16 BE. Therefore, it can be determined just by checking the BOM at the beginning of the file.
A simple workaround
Assuming that the xml encoding of the Excel file is UTF-8 you can only add :external-format :utf-8 to the octets-to-string in the get-content function.
(ql:quickload :xlsx) ; load your xlsx package (in-package :xlsx) ; go to xlsx package ; redefine get-entry function (defun get-entry (name zip) (let ((entry (zip:get-zipfile-entry name zip))) (when entry (xmls:parse (flex:octets-to-string (zip:zipfile-entry-contents entry) :external-format :utf-8 ;;;; add this ! ))))) (in-package :cl-user) ; go back to cl-user to use xlsx with multilingual support ;;; for example (xlsx:sheet-names "Your_Language.xlsx")
full workaround
(defun get-entry (name zip) ;; ECMA-376 Part 2 (Open Packaging Conventions), section 6.2.5 ;; https://ecma-international.org/publications-and-standards/standards/ecma-376/ ;; specifies that XML parts must be encoded in UTF-8 or UTF-16 only. ;; Therefore, we detect encoding based on BOM only. ;; - UTF-8 / UTF-8-BOM / UTF-16LE / UTF-16BE (flet ((decode (octets) (let* ((len (length octets)) ;; Determine encoding based on BOM (Byte Order Mark) (enc (cond ;; UTF-8 BOM: EF BB BF ((and (>= len 3) (= #xEF (aref octets 0)) (= #xBB (aref octets 1)) (= #xBF (aref octets 2))) :utf-8) ;; UTF-16LE BOM: FF FE ((and (>= len 2) (= #xFF (aref octets 0)) (= #xFE (aref octets 1))) :utf-16le) ;; UTF-16BE BOM: FE FF ((and (>= len 2) (= #xFE (aref octets 0)) (= #xFF (aref octets 1))) :utf-16be) ;; Default to UTF-8 if no BOM is present (t :utf-8))) ;; Skip length: number of BOM bytes to exclude from string ;; This ensures the returned string does not include the BOM character (U+FEFF) (skip (case enc (:utf-8 (if (and (>= len 3) (= #xEF (aref octets 0))) 3 0)) (:utf-16le 2) (:utf-16be 2) (otherwise 0)))) ;; Decode octets to string, skipping BOM (flex:octets-to-string (subseq octets skip) :external-format enc)))) (let ((entry (zip:get-zipfile-entry name zip))) (when entry (xmls:parse (decode (zip:zipfile-entry-contents entry)))))))