๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Python/์Šคํฌ๋กค๋ง

[Python] ํฌ๋กค๋ง Bookmarks

๋ฐ˜์‘ํ˜•

 

 

https://pythondocs.net/selenium/%EC%85%80%EB%A0%88%EB%8B%88%EC%9B%80-%ED%81%AC%EB%A1%A4%EB%9F%AC-%EA%B8%B0%EB%B3%B8-%EC%82%AC%EC%9A%A9%EB%B2%95/

 

์…€๋ ˆ๋‹ˆ์›€ ํฌ๋กค๋Ÿฌ ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ• - ๋ปฅ๋šซ๋ฆฌ๋Š” ํŒŒ์ด์ฌ ์ฝ”๋“œ ๋ชจ์Œ

์…€๋ ˆ๋‹ˆ์›€ ์ „๋ฐ˜์— ๊ด€ํ•˜์—ฌ ๊ฐ„๋žตํ•˜๊ฒŒ ์ •๋ฆฌํ•œ๋‹ค. ์ด ๋ฌธ์„œ๋Š” ์…€๋ ˆ๋‹ˆ์›€ ๋ฒ„์ „ 3 ๊ธฐ์ค€์ด๋‹ค. ์ตœ๊ทผ 4๋ฒ„์ „์ด ์ถœ์‹œ๋˜์—ˆ์œผ๋‚˜ ์‚ฌ์šฉ๋ฐฉ๋ฒ•์ด ์•ฝ๊ฐ„ ๋‹ค๋ฅด๋‹ˆ ์ด ๋ถ€๋ถ„์„ ํ™•์ธํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค. ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์ด๋‚˜ ์˜ˆ์‹œ๋Š” ๋”ฐ๋กœ

pythondocs.net

 

 

 

https://sjwiq200.tistory.com/11

 

[PYTHON] Selenium ์—์„œ ํ—ค๋” User-Agent ๊ฐ’ ์ˆ˜์ •ํ•˜๊ธฐ

์•ˆ๋…•ํ•˜์„ธ์š”!! ์˜ค๋Š˜์€ Python๊ณผ Selenium์„ ํ™œ์šฉํ•ด์„œ ํฌ๋กค๋ง ํ•˜๋Š” ๋„์ค‘์— ํ•œ ์‚ฌ์ดํŠธ๊ฐ€ ์˜ค์ง IE์—์„œ๋งŒ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค ใ… ใ… ...... ๊ทธ๋ž˜์„œ ํ•ด๊ฒฐ์ฑ…์ด ํ—ค๋”๊ฐ’์— User-Agent ๊ฐ’์„ IE์˜ ๊ฐ’์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ

sjwiq200.tistory.com

 

from selenium import webdriver
import Config

options = webdriver.ChromeOptions()
# options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")

driver = webdriver.Chrome(executable_path=Config.CONFIG['CHROMEPATH'],options=options)

driver.get('')

 

 

 

 

https://github.com/liamcryan/iherb

 

GitHub - liamcryan/iherb: Get iherb products and details

Get iherb products and details. Contribute to liamcryan/iherb development by creating an account on GitHub.

github.com

 

 

 

 

 

 

 

 

 

 

https://hashcode.co.kr/questions/15732/%EC%9B%B9%EC%82%AC%EC%9D%B4%ED%8A%B8-%ED%81%AC%EB%A1%A4%EB%A7%81%ED%95%98%EB%A0%A4%EB%8A%94%EB%8D%B0-html-error-%EB%95%8C%EB%AC%B8%EC%97%90-%EC%A7%84%ED%96%89%EC%9D%B4-%EC%95%88%EB%90%98%EB%84%A4%EC%9A%94

 

์›น์‚ฌ์ดํŠธ ํฌ๋กค๋งํ•˜๋ ค๋Š”๋ฐ html error ๋•Œ๋ฌธ์— ์ง„ํ–‰์ด ์•ˆ๋˜๋„ค์š”..

import urllib.request from bs4 import BeautifulSoup url = 'https://kr.iherb.com/search?kw=21st%20century' html = urllib.request.urlopen(url).read() soup = BeautifulSoup(html, 'html.parser') address = soup.find_all(class_='absolute-link

hashcode.co.kr

 

import urllib.request
from bs4 import BeautifulSoup

url = 'https://kr.iherb.com/search?kw=21st%20century'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

address = soup.find_all(class_='absolute-link product-link')

for i in address:
    print(i.attrs['href'])
    print()

 

url = 'https://somewhere.com'

request = urllib.request.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0')

html = urllib.request.urlopen(request).read()

 

 

 

 

 

 

 

 

 

https://gogl3.github.io/articles/2021-03/webcrawling

 

Web-crawling using Python

Today, we are going to know how to crawl iherb using Python, especially information about supplements. Scheme Before writing codes, it is important to decide which information is needed and how to store. In this example, assume that I want to collect suppl

gogl3.github.io

 

๋ฐ˜์‘ํ˜•

'Python > ์Šคํฌ๋กค๋ง' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Python] Pyppeteer  (0) 2022.07.25
[Python] Selenium  (0) 2022.05.04