PDF cmap

ja.wikipedia.org

github.com

https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5078.Adobe-Japan1-6.pdf

!wget https://github.com/pdfminer/pdfminer.six/archive/develop.zip
!unzip develop.zip

!cd pdfminer.six-develop/

cid2code_Adobe_Japan1.txtを書き換えてビルドしなおしてから実行

  • 1921「戸」2f3e,6238→6238、e2bcbe,e688b8→e688b8に変更したけど変わらない
  • 13757「戶」6236→6238、e688b6→e688b8に変更したけど変わらない
!python tools/conv_cmap.py -c B5=cp950 -c UniCNS-UTF8=utf-8 pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt
!python tools/conv_cmap.py -c GBK-EUC=cp936 -c UniGB-UTF8=utf-8 pdfminer/cmap Adobe-GB1 cmaprsrc/cid2code_Adobe_GB1.txt
!python tools/conv_cmap.py -c RKSJ=cp932 -c EUC=euc-jp -c UniJIS-UTF8=utf-8 pdfminer/cmap Adobe-Japan1 cmaprsrc/cid2code_Adobe_Japan1.txt
!python tools/conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer/cmap Adobe-Korea1 cmaprsrc/cid2code_Adobe_Korea1.txt
!python setup.py install

qiita.com

www.unixuser.org

text.baldanders.info

github.com

blog.antenna.co.jp

blog.antenna.co.jp

www.antenna.co.jp