[Python] 크롤링 Bookmarks

https://pythondocs.net/selenium/%EC%85%80%EB%A0%88%EB%8B%88%EC%9B%80-%ED%81%AC%EB%A1%A4%EB%9F%AC-%EA%B8%B0%EB%B3%B8-%EC%82%AC%EC%9A%A9%EB%B2%95/

셀레니움 크롤러 기본 사용법 - 뻥뚫리는 파이썬 코드 모음

셀레니움 전반에 관하여 간략하게 정리한다. 이 문서는 셀레니움 버전 3 기준이다. 최근 4버전이 출시되었으나 사용방법이 약간 다르니 이 부분을 확인하길 바란다. 사용 방법이나 예시는 따로

pythondocs.net

https://sjwiq200.tistory.com/11

[PYTHON] Selenium 에서 헤더 User-Agent 값 수정하기

안녕하세요!! 오늘은 Python과 Selenium을 활용해서 크롤링 하는 도중에 한 사이트가 오직 IE에서만 작동하는 것을 알았습니다 ㅠㅠ...... 그래서 해결책이 헤더값에 User-Agent 값을 IE의 값으로 바꿔주

sjwiq200.tistory.com

 from selenium import webdriver
import Config
 
options = webdriver.ChromeOptions()
# options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")
 
driver = webdriver.Chrome(executable_path=Config.CONFIG['CHROMEPATH'],options=options)
 
driver.get('')

https://github.com/liamcryan/iherb

GitHub - liamcryan/iherb: Get iherb products and details

Get iherb products and details. Contribute to liamcryan/iherb development by creating an account on GitHub.

github.com

https://hashcode.co.kr/questions/15732/%EC%9B%B9%EC%82%AC%EC%9D%B4%ED%8A%B8-%ED%81%AC%EB%A1%A4%EB%A7%81%ED%95%98%EB%A0%A4%EB%8A%94%EB%8D%B0-html-error-%EB%95%8C%EB%AC%B8%EC%97%90-%EC%A7%84%ED%96%89%EC%9D%B4-%EC%95%88%EB%90%98%EB%84%A4%EC%9A%94

웹사이트 크롤링하려는데 html error 때문에 진행이 안되네요..

import urllib.request from bs4 import BeautifulSoup url = 'https://kr.iherb.com/search?kw=21st%20century' html = urllib.request.urlopen(url).read() soup = BeautifulSoup(html, 'html.parser') address = soup.find_all(class_='absolute-link

hashcode.co.kr

 import urllib.request
from bs4 import BeautifulSoup
 
url = 'https://kr.iherb.com/search?kw=21st%20century'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
 
address = soup.find_all(class_='absolute-link product-link')
 
for i in address:
    print(i.attrs['href'])
    print()

 url = 'https://somewhere.com'
 
request = urllib.request.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0')
 
html = urllib.request.urlopen(request).read()

https://gogl3.github.io/articles/2021-03/webcrawling

Web-crawling using Python

Today, we are going to know how to crawl iherb using Python, especially information about supplements. Scheme Before writing codes, it is important to decide which information is needed and how to store. In this example, assume that I want to collect suppl

gogl3.github.io

저작자표시 비영리 변경금지

'Python > 스크롤링' 카테고리의 다른 글

[Python] Pyppeteer (0)	2022.07.25
[Python] Selenium (0)	2022.05.04

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Developer🤖

[Python] 크롤링 Bookmarks

'Python > 스크롤링' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

	from selenium import webdriver
	import Config

	options = webdriver.ChromeOptions()
	# options.add_argument('--headless')
	options.add_argument('--no-sandbox')
	options.add_argument('--disable-dev-shm-usage')
	options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")

	driver = webdriver.Chrome(executable_path=Config.CONFIG['CHROMEPATH'],options=options)

	driver.get('')

	import urllib.request
	from bs4 import BeautifulSoup

	url = 'https://kr.iherb.com/search?kw=21st%20century'
	html = urllib.request.urlopen(url).read()
	soup = BeautifulSoup(html, 'html.parser')

	address = soup.find_all(class_='absolute-link product-link')

	for i in address:
	print(i.attrs['href'])
	print()

	url = 'https://somewhere.com'

	request = urllib.request.Request(url)
	request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0')

	html = urllib.request.urlopen(request).read()

[Python] 크롤링 Bookmarks

'Python > 스크롤링' 카테고리의 다른 글

'Python/스크롤링' Related Articles

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역