2021-11-01から1ヶ月間の記事一覧

「ふもとっぱら」の空き状況をplaywrightでスクレイピング

# ブラウザ操作コード自動生成 playwright codegen fumotoppara.secure.force.com -o fumotoppara.py import pandas as pd from playwright.sync_api import Playwright, sync_playwright def run(playwright: Playwright) -> pd.DataFrame: browser = playw…

「ふもとっぱら」の空き状況をスクレイピング

note.com ysdyt.hatenablog.jp seleniumなしでスクレイピング import time import pandas as pd import requests from bs4 import BeautifulSoup headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko" } pay…

chokkan.github.io http://www.kunitomo-lab.sakura.ne.jp/2021-3-3Open(S).pdf

pdfplumberで縦線位置情報抽出

import pdfplumber import pandas as pd import decimal pdf = pdfplumber.open("data.pdf") result = [] for i in range(0, 2626, 175): page = pdf.pages[i] dfv = pd.DataFrame(page.debug_tablefinder().edges) vartical = dfv.loc[dfv.orientation == "…

jma

df = pd.read_fwf( "i2019.zip", encoding="cp932", header=None, widths=[ 1, 4, 2, 2, 2, 2, 4, 4, 3, 4, 4, 4, 4, 4, 5, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 3, 22, 5, 1, ], )