๋ฐ์ํ
https://sjwiq200.tistory.com/11
from selenium import webdriver
import Config
options = webdriver.ChromeOptions()
# options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")
driver = webdriver.Chrome(executable_path=Config.CONFIG['CHROMEPATH'],options=options)
driver.get('')
https://github.com/liamcryan/iherb
import urllib.request
from bs4 import BeautifulSoup
url = 'https://kr.iherb.com/search?kw=21st%20century'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
address = soup.find_all(class_='absolute-link product-link')
for i in address:
print(i.attrs['href'])
print()
url = 'https://somewhere.com'
request = urllib.request.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0')
html = urllib.request.urlopen(request).read()
https://gogl3.github.io/articles/2021-03/webcrawling
๋ฐ์ํ
'Python > ์คํฌ๋กค๋ง' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Python] Pyppeteer (0) | 2022.07.25 |
---|---|
[Python] Selenium (0) | 2022.05.04 |