Camelot: PDF Table Extraction for Humans — Camelot 0.8.2 documentation
インストール
Installation of dependencies — Camelot 0.8.2 documentation
apt install python3-tk ghostscript pip install camelot-py[cv]
# PATH追加 export PATH=$PATH:/home/imabari/.local/bin # 変換 camelot -p 2-end -o black.xlsx -f excel -split lattice 180928.pdf # 表示 camelot -p 2 lattice -plot joint 180928.pdf # 線が短い表の場合 -scale 40 を付ける camelot -p all -o black.xlsx -f excel -split lattice -scale 40 180928.pdf # テキストの改行スペース削除 -strip ' \n' camelot -p all -o black.xlsx -f excel -split -strip ' \n' lattice 180928.pdf # テキストコピー camelot -p all -o data.csv -f csv -strip ' .\n' -split lattice -scale 40 -copy v data.pdf
コマンドライン
Command-Line Interface — Camelot 0.8.2 documentation
Usage: camelot [OPTIONS] COMMAND [ARGS]... Camelot: PDF Table Extraction for Humans Options: --version Show the version and exit. -q, --quiet TEXT Suppress logs and warnings. -p, --pages TEXT Comma-separated page numbers. Example: 1,3,4 or 1,4-end. -pw, --password TEXT Password for decryption. -o, --output TEXT Output file path. -f, --format [csv|json|excel|html] Output file format. -z, --zip Create ZIP archive. -split, --split_text Split text that spans across multiple cells. -flag, --flag_size Flag text based on font size. Useful to detect super/subscripts. -strip, --strip_text TEXT Characters that should be stripped from a string before assigning it to a cell. -M, --margins <FLOAT FLOAT FLOAT>... PDFMiner char_margin, line_margin and word_margin. --help Show this message and exit. Commands: lattice Use lines between text to parse the table. stream Use spaces between text to parse the table.