[웹 크롤링 - Python] BeautifulSoup(Requests, Selenium)의 응용(2)

Python Library/웹 크롤링

[웹 크롤링 - Python] BeautifulSoup(Requests, Selenium)의 응용(2)

바보1 2022. 2. 2. 19:49

이번에는 오늘의 정보를 가져와서 text파일에 저장하는 함수를 만들었습니다.

날씨와 헤드라인 뉴스, 그리고 IT뉴스와 헤커스의 오늘의 영어 회화를 가져와서 today.txt파일에 저장하는 함수입니다.

아 그리고 Requests와 Selenium을 쓸 때도 있고, 안 쓸 때도 있습니다. 아마 동시에 쓸 일은 아직까지는 없네요.

일단 둘 다 공부해야해서 한 파트에 한 가지만 쓰고 있습니다.

find와 select를 혼용해서 쓴 이유는 공부하기 위해서입니다. 한 가지만 쓰면 다른 한 가지는 까먹기 때문에..

import requests
from bs4 import BeautifulSoup
import sys

# 기본 출력을 today.txt로 바꿈
sys.stdout = open('today.txt', 'w', encoding='utf8')

def requests_function(url):
    header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                      'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
    }
    response = requests.get(url, headers=header)

    soup = BeautifulSoup(response.text, 'lxml')

    return soup

# 대구의 날씨 url
weather_url = "https://search.naver.com/search.naver?where=nexearch&sm=top_hty&fbm=0&ie=utf8&query=%EB%8C%80%EA%B5%AC+%EB%82%A0%EC%94%A8"
weather_soup = requests_function(weather_url)

print('[오늘의 날씨]')

# 오늘의 날씨의 요약을 가져옴
summary = weather_soup.find('p', attrs={'class': 'summary'}).get_text().split()
print(summary[3], ' '.join(summary[:3]))

# 현재 온도를 가져옴
curr_temp = weather_soup.select_one('.temperature_text > strong').get_text()
print(curr_temp[:3] + curr_temp[5:], end='')

# 최저, 최고 온도를 가져옴
temp = weather_soup.select_one('.temperature_inner').get_text().split('/')
print(' (' + temp[0][:3].strip(), temp[0][5:].strip(), '/', temp[1][:3].strip(), temp[1][5:].strip() + ')')

# 강수 확률을 가져옴
rain = weather_soup.find_all('span', attrs={'class': 'rainfall'})
print(f'오전 강수 확률 {rain[0].get_text()} / 오후 강수 확률 {rain[1].get_text()}', end='\n\n')

# 미세먼지 정보를 가져옴
dust = weather_soup.select('.today_chart_list .txt')
print(f'미세먼지 : {dust[0].get_text()}')
print(f'초 미세먼지 : {dust[1].get_text()}', end='\n\n')

# 뉴스 정보
news_url = 'https://news.naver.com/main/list.naver?mode=LPOD&mid=sec&sid1=001&sid2=140&oid=001&isYeonhapFlash=Y'
news_soup = requests_function(news_url)

print('[헤드라인 뉴스]')

# 상위 뉴스를 가져옴
newses = news_soup.find_all('a', attrs={'class': 'nclicks(fls.list)'})

for i, news in enumerate(newses):
    print(f'{i+1}. {news.get_text()}')
    print(f'링크 : {news["href"]}')

# IT뉴스 정보
IT_news_url = 'https://news.naver.com/main/main.naver?mode=LSD&mid=shm&sid1=105'
IT_news_soup = requests_function(IT_news_url)

print('\n[IT 뉴스]')

# 상위 뉴스를 가져옴
IT_newses = IT_news_soup.find_all('a', attrs={'class': 'cluster_text_headline nclicks(cls_sci.clsart)'})

for i,IT_news in zip(range(3), IT_newses):
    print(f'{i+1}. {IT_news.get_text()}')
    print(f'링크 : {IT_news["href"]}')

# 영어회화 정보
eng_url = 'https://www.hackers.co.kr/?c=s_eng/eng_contents/I_others_english&keywd=haceng_submain_lnb_eng_I_others_english&logger_kw=haceng_submain_lnb_eng_I_others_english'
eng_soup = requests_function(eng_url)

print('\n[오늘의 영어 회화]')

# 오늘의 영어 회화를 가져옴
print('(영어 지문)')
for i in range(2, 6):
    print(eng_soup.find_all('div', attrs={'id':'conv_kor_t{}'.format(i)})[1].get_text().strip())

# 오늘의 영어 회화 한글문을 가져옴
print('\n(한글 지문)')
for i in range(2, 6):
    print(eng_soup.find_all('div', attrs={'id':'conv_kor_t{}'.format(i)})[0].get_text().strip())

sys.stdout.close()

아 그리고 print문 안이 생각보다 더러울 수 있는데, 참고 바랍니다..

아직 파이썬 초보라 깔끔하게 하는 법을 잘 모르겠네요 ..

아마 끝의 strip() 이런 거를 없애시면서 print해보면 제가 왜 했는지 아실거예요.

참고 : 나도코딩

'Python Library > 웹 크롤링' 카테고리의 다른 글

[웹 크롤링 - Python] BeautifulSoup(Requests, Selenium)의 응용(1) (0)	2022.02.02
[웹 크롤링 - Python] Selenium 사용법 (0)	2022.02.02
[웹 크롤링 - Python] Selenium 프레임워크 및 웹 드라이버 (0)	2022.02.01
[웹 크롤링 - Python] 응용 및 홈페이지 url 변경 크롤링 (2)	2022.01.31
[웹 크롤링 - Python] BeautifulSoup 사용법 (2)	2022.01.28

현재글[웹 크롤링 - Python] BeautifulSoup(Requests, Selenium)의 응용(2)

안녕 바보1 님의 블로그입니다.

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

안녕

[웹 크롤링 - Python] BeautifulSoup(Requests, Selenium)의 응용(2)

'Python Library > 웹 크롤링' 카테고리의 다른 글

'Python Library/웹 크롤링'의 다른글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

[웹 크롤링 - Python] BeautifulSoup(Requests, Selenium)의 응용(2)

'Python Library > 웹 크롤링' 카테고리의 다른 글

'Python Library/웹 크롤링'의 다른글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역