「Python Webスクレイピング実践入門」の

「1時間ごとに日本経済新聞にアクセスを行いその時の日経平均株価をcsvに記録する」を

定時実行を「apscheduler」に「select_one」でCSS セレクタで取得に変更しました

基本

imabari.hateblo.jp

Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation

kondou.com - Beautiful Soup 4.2.0 Doc. 日本語訳 (2013-11-19最終更新)

先頭のひとつ取得

find(name, attrs, recursive, string, **kwargs)
select_one(selector)

複数取得

find_all(name, attrs, recursive, string, limit, **kwargs)
select(selector, _candidate_generator=None, limit=None)

準備

パッケージをインストール

# Python 3.6.2
pip install requests
pip install apscheduler
pip install beautifulsoup4

日経平均株価のCSS セレクタを取得

Chromeで日本経済新聞のホームページにアクセス http://www.nikkei.com/markets/kabu/
日経平均株価を選択し右クリック－検証を選択する
HTMLのソースが反転しているところで右クリック－Copy－Copy selectorを選択する
クリップボードにCSS セレクタがコピーされているので以下のソースに貼り付ける

#CONTENTS_MARROW > div.mk-top_stock_average.cmn-clearfix > div.cmn-clearfix > div.mkc-guidepost > div.mkc-prices > span.mkc-stock_prices

nikkei_heikin = soup.select_one('#CONTENTS_MARROW > div.mk-top_stock_average.cmn-clearfix > div.cmn-clearfix > div.mkc-guidepost > div.mkc-prices > span.mkc-stock_prices').get_text(strip=True)

スクレイピング

import csv
import datetime

import requests
from apscheduler.schedulers.blocking import BlockingScheduler
from bs4 import BeautifulSoup

sched = BlockingScheduler()

# １時間ごとに実行する
# @sched.scheduled_job('interval', hours=1)


# 毎時0分に実行する
@sched.scheduled_job('cron', minute=0, hour='*/1')
def scheduled_job():


    # 日本経済新聞の日経平均株価ページにアクセスし、HTMLを取得する
    r = requests.get('http://www.nikkei.com/markets/kabu/')

    # エラーがないか確認する
    if r.status_code == requests.codes.ok:

        # BeautifulSoupを使い日経平均株価を取得する
        soup = BeautifulSoup(r.content, 'html.parser')
        nikkei_heikin = soup.select_one(
            '#CONTENTS_MARROW > div.mk-top_stock_average.cmn-clearfix > div.cmn-clearfix > div.mkc-guidepost > div.mkc-prices > span.mkc-stock_prices'
        ).get_text(strip=True)

        # 今の時間を文字列に変換する
        now = datetime.datetime.now().strftime('%Y/%m/%d %H:%M:%S')

        print('{} {}'.format(now, nikkei_heikin))

        # CSVに日時と日経平均株価の値を追記する
        with open('nikkei_heikin.csv', 'a') as fw:
            writer = csv.writer(fw, dialect='excel', lineterminator='\n')
            writer.writerow([now, nikkei_heikin])


sched.start()

標準ライブラリの場合

例 find('タグ名', class_='クラス名')

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = "http://www.nikkei.com/markets/kabu/"

html = urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

nikkei_heikin = soup.find('span', class_='mkc-stock_prices').string

print(nikkei_heikin)

メモ

Python3 Webスクレイピングの実践入門

基本

先頭のひとつ取得

複数取得

準備

パッケージをインストール

日経平均株価のCSS セレクタを取得

スクレイピング

標準ライブラリの場合

基本

先頭のひとつ取得

複数取得

準備

パッケージをインストール

日経平均株価のCSSセレクタを取得

スクレイピング

標準ライブラリの場合

日経平均株価のCSS セレクタを取得