๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Python/์Šคํฌ๋กค๋ง

[Python] Pyppeteer

๋ฐ˜์‘ํ˜•

 

 

window size ์กฐ์ ˆ

browser = await launch(headless=False,
                        ignoreHTTPSErrors=True,
                        ignoreDefaultArgs=['--enable-automation'],
                        args=['--window-size=800,600'])

 

 

js ์‚ฌ์šฉํ•˜๊ธฐ

javascript

 

await page.evaluate('(el) => el.value = ""', elem)

await page.evaluate('(el) => el.click()', btn[0])

 

 

 

 

.getProperty๋กœ ์•ˆ๋  ๋•Œ evaluate ์ด์šฉํ•ด์„œ .getAttribute ์‚ฌ์šฉ

li_elements = await page.querySelectorAll('.thmb')
for ind, li in enumerate(li_elements):
	a = await page.evaluate("(element) => element.getAttribute('data-a')", li)

 

img_element = await page.evaluate('document.querySelector("img")', force_expr=True)

 

 

XPath ๋ฌธ๋ฒ•

ํ˜•์ œ div ์„ ํƒ : following-sibling::dv

await page.xpath(f'//*[@id="{id}"]/following-sibling::div')

 

 

 

waitForXPath

์˜ต์…˜

  • visible (bool): wait for element to be present in DOM and to be visible, i.e. to not have display: none or visibility: hidden CSS properties. Defaults to False.
  • hidden (bool): wait for element to not be found in the DOM or to be hidden, i.e. have display: none or visibility: hidden CSS properties. Defaults to False.
  • timeout (int|float): maximum time to wait for in milliseconds. Defaults to 30000 (30 seconds). Pass 0 to disable timeout.

 

 

btn-login ๊ธฐ๋‹ค๋ ธ๋‹ค๊ฐ€ ํด๋ฆญ

await page.waitForXPath('//button[@class="btn btn-login"]', { 'visible': 'true', 'timeout': 100000 })
btn_login = await page.xpath('//button[@class="btn btn-login"]')
await btn_login[0].click()

 

https://www.tools4testing.com/contents/puppeteer/puppeteer-waitforxpath

 

Puppeteer waitForXPath - tools4testing

page.waitForXPath method is used to wait for the element/elements represented by the xpath to appear or to disappear from the page. waitForXPath method returns the promise with the element handle which is represented by the selector expression and null if

www.tools4testing.com

 

https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.waitForXPath

 

API Reference — Pyppeteer 0.0.25 documentation

Return awaitable object which resolves to a JSHandle of the success value.

miyakogi.github.io

 

 

 

 

 

 

์˜ค๋ฅ˜ ๋ฐœ์ƒ

pyppeteer.errors.PageError: net::ERR_SSL_VERSION_INTERFERENCE at https://google.com

 

chromium ์—ด์–ด์„œ ๋“ค์–ด๊ฐ€๋„ ์ ‘์†์ด ์•ˆ๋จ

 

 

chrome://flags/ ๋“ค์–ด๊ฐ€์„œ TLS1.3 Disabled ํ•˜๋‹ˆ๊นŒ ๋จ

๊ทผ๋ฐ pyppeteer (headless=False) ๋กœ ์‹คํ–‰ํ•˜๋ฉด Disabled ์•ˆ๋˜์–ด ์žˆ์Œ..ใ…‹ใ…‹

headless=True ๋กœ ํ•˜๋ฉด ๊ทธ๋ƒฅ ๋˜ ๋œ๋‹ค..

 

 

 

https://geekingup.org/ko/solved-err_ssl_version_interference-%ED%81%AC%EB%A1%AC-%EC%98%A4%EB%A5%98-%EC%88%98%EC%A0%95

 

[Solved] ERR_SSL_VERSION_INTERFERENCE ํฌ๋กฌ ์˜ค๋ฅ˜ ์ˆ˜์ • - GeekingUp

SSL ์ธ์ฆ์„ ๋ฐ›์€ ํŠน์ • ๋˜๋Š” ์—ฌ๋Ÿฌ ์›น์‚ฌ์ดํŠธ์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Chrome์ด ๋ฐ˜ํ™˜ํ•˜๋Š” ์˜ค๋ฅ˜ ์ฝ”๋“œ๊ฐ€ ERR_SSL_VERSION_INTERFERENCE, ๋‹ค์Œ๊ณผ ๊ด€๋ จ๋œ ์˜ค๋ฅ˜์ž…๋‹ˆ๋‹ค. SSL certificate ์›น์‚ฌ์ดํŠธ

geekingup.org

 

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch(headless=False,
                           ignoreHTTPSErrors=True,
                           args=['--ignore-certificate-errors',
                                 '--window-size=1366,768',
                                 '--disable-notifications',
                                 '--disable-features=tls13-variant',
                                 '--disable-features=tls13',
                                 '--disable-tls13',
                                 '--disable-tls13-variant',
                                 ],
                            )
    page = await browser.newPage()
    await page.goto('chrome://flags')
    await page.type('#search', 'tls')
    await page.close()
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

 

 

 

https://soyoung-new-challenge.tistory.com/92

 

[Python] image url์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋‹ค์šด๋กœ๋“œ

์ด๋ฒˆ ํฌ์ŠคํŒ…์€ ํŒŒ์ด์ฌ์—์„œ ์ด๋ฏธ์ง€ ์ฃผ์†Œ๋ฅผ ๊ฐ€์ง€๊ณ  ์ด๋ฏธ์ง€ ํŒŒ์ผ๋กœ ๋‹ค์šด๋กœ๋“œํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์— ๊ด€ํ•œ ํŠœํ† ๋ฆฌ์–ผ์ž…๋‹ˆ๋‹ค. ์ธํ„ฐ๋„ท์— ์กด์žฌํ•˜๋Š” ์ด๋ฏธ์ง€ ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œ ํ•˜๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ์‹ ์•„๋ž˜ ์ ํ˜€ ์žˆ๋Š” ์†Œ

soyoung-new-challenge.tistory.com

 

List of Chromium Command Line Switches

 

https://peter.sh/experiments/chromium-command-line-switches/

 

List of Chromium Command Line Switches « Peter Beverloo

 

peter.sh

 

 

 

https://developpaper.com/the-pyppeter-library-of-python-crawler-is-simple-to-use/

 

The pyppeter Library of Python crawler is simple to use - Develop Paper

pyppeteer Before introducing pyppeteer, let’s talk about puppeter. Puppeter is a node based product produced by Google A tool developed by JS is mainly used to manipulate the API of Chrome browser, manipulate Chrome browser through JavaScript code, and c

developpaper.com

 

 

#The puppeter supports dictionary parameter transfer
browser = await launch({'headless':True})

#Pyppeter supports dictionary and keyword parameter transfer
browser = await launch({'headless':True})
browser = await launch(headless=True)

 

 

#Pyppeter uses Python style function names
page.querySelector()/page.querySelectorAll()/page.xpath()

#Abbreviation
page.J()/page.JJ()/page.Jx()

 

Simple use

import asyncio
from pyppeteer import launch

async def main():
    url = 'https://www.toutiao.com/'
    #If the headless parameter is set to false, the headless mode will be changed
    browser = await launch(headless=False, ignoreDefaultArgs=['--enable-automation'])
    page = await browser.newPage()
    
    #Set page view size
    await page.setViewport(viewport={'width':1600,'herght':900})
    
    #Whether to enable JS. If enabled is set to false, there will be no rendering effect
    await page.setJavaScriptEnable(enabled=True)
    
    #Waiting time 1000 ms
    res = await page.goto(url๏ผŒoptions={'timeout':1000})
    resp_ Headers = res.headers # response header
    resp_ Status = res.status # response status
    
    #Wait
    await asyncio.sleep(2)
    await page.waitFor(1000)
    #The second method is to forcibly query an element in the while loop and wait
    while not await page.querySelector('.t')
    
    #Scroll to the bottom of the page
    await page.evaluate('window.scrollBy(0,document.body.scrollHeight)')
    
    await page.screenshot({'path':'test.png'})
    
    #Print web cookies
    print(await page.cookies())
    
    #Get all HTML content
    print(await page.content())
    
    
    dimensions = await page.evaluate(pageFunction='''() => {
    		return {
    			width:document. documentElement. Clentwidth, // page width
    			height:document. documentElement. Clentheight, // page height
    			deviceScaleFactor: window. Devicepixelratio, // pixel ratio 1.0000000149011612
    			}
    		}''',force_ expr=False)   # force_ Expr = false executes a function
    print(dimensions)
    
    content = await page. Evaluate (pagefunction ='Document. Body. Textcontent ', force_expr = true) # only get text, execute JS script, force_ Expr = true executes an expression
    print(content)
    
    #Prints the title of the current page
    print(await page.title())
    
    
    #You can use XPath expressions to grab news content
    '''
    Three parsing methods of pyppeter
    page.querySelector()
    page.querySelectorAll()
    page.xpath()
    The abbreviation is:
    page.J()
    page.JJ()
    page.Jx()
    '''
    element = await page.querySelector(".feed-infinite-wrapper > ul>li")
    print(element)
    
    
    element = await page.querySelectorAll(".title-box a")
    for item in element:
        print(await item.getProperty('textContent'))
        #Get text content
        title_str = await (await item.getProperty('textContent')).jsonValue()
        
        title_link = await (await item.getProperty('textContent')).jsonValue()
        
        #Get property value
        # title = await (await item.getProperty('class')).jsonValue()
        print(title_str,title_link)
    await browser.close()


asyncio.get_event_loop().run_until_complete(main())

 

 

Simulate text input and click

#Analog input account password parameter {'delay': reach_int()} delay input time
await page. Type ('#kw', 'Baidu', delay = 100)
await page.type('#TPL_username_1',"asdasd")

await page.waitFor(1000)
await page.click('#su')

 

Removing chrome is under the control of automated testing software

browser = await launch(headless=False, ignoreDefaultArgs=['--enable-automation'])
#Add ignoredefaultargs = ['-- enable automation'] parameter

 

 

Climb Jingdong Mall

from bs4 import BeautifulSoup
from pyppeteer import launch
import asyncio


def screen_size():
    return 1600,900


async def main(url):
    browser = await launch({"args":['--no-sandbox'],}) # "headless":False
    page = await browser.newPage()
    width, height = screen_size()
    await page.setViewport(viewport={'width':width,'height':height})
    await page.setJavaScriptEnabled(enabled=True)
    await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36')
    await page.goto(url)

    await page.evaluate('window.scrollBy(0, document.body.scrollHeight)')

    await asyncio.sleep(1)

    # content = await page.content()
    li_list = await page.xpath('//*[@id="J_goodsList"]/ul/li')

    item_list = []
    for li in li_list:
        a = await li.xpath('.//div[@class="p-img"]/a')
        detail_url = await (await a[0].getProperty('href')).jsonValue()
        promo_words = await (await a[0].getProperty('title')).jsonValue()
        a_ = await li.xpath('.//div[@class="p-commit"]/strong/a')
        p_commit = await (await a_[0].getProperty('textContent')).jsonValue()
        i = await li.xpath('./div/div[3]/strong/i')
        price = await (await i[0].getProperty('textContent')).jsonValue()
        em = await li.xpath('./div/div[4]/a/em')
        title = await (await em[0].getProperty('textContent')).jsonValue()
        item = {
            "title" : title,
            "detail_url" : detail_url,
            "promp_words" : promo_words,
            "p_commit" : p_commit,
            "price" : price
        }
        item_list.append(item)

    await page_close(browser)
    return item_list


async def page_close(browser):
    for _page in await browser.pages():
        await _page.close()
    await browser.close()


url = 'https://search.jd.com/Search?keyword=%E6%89%8B%E6%9C%BA&wq='\
		'%E6%89%8B%E6%9C%BA&pvid=e07184578b8442c58ddd65b221020e99&page={}&s=56&click=0 '
task_list = []
for i in range(1,4):
    page = i * 2 - 1
    task_list.append(main(url.format(page)))

results = asyncio.get_event_loop().run_until_complete(asyncio.gather(*task_list))

for i in results:
    print(i,len(i))

print('*'*100)

 

 

https://intrepidgeeks.com/tutorial/basic-use

 

pyppeter(python ํŒ puppeter)๊ธฐ๋ณธ ์‚ฌ์šฉ

Django_Relation(Like) Like ๊ตฌํ˜„ํ•˜๊ธฐ ์—ฌ๋Ÿฌ ์œ ์ €๊ฐ€ ํ•œ Article์— ์ข‹์•„์š”๋ฅผ ๋ˆ„๋ฅผ ์ˆ˜ ์žˆ๊ณ , ํ•œ ์œ ์ €๊ฐ€ ์—ฌ๋Ÿฌ Article์— ์ข‹์•„์š”๋ฅผ ๋ˆ„๋ฅผ ์ˆ˜ ์ž‡๋‹ค. model ๊ตฌ์„ฑ error ๋ฐœ์ƒ ์›์ธ user์™€ ์ ‘๊ทผ๊ณผ like_user์˜ ์ ‘๊ทผ์ด ๊ฒน์ณค๊ธฐ ๋•Œ

intrepidgeeks.com

 

 

 

import asyncio
from pyppeteer import launch
import time

async def main():exepath = 'C:/Users/tester02/AppData/Local/Google/Chrome/Application/chrome.exe'
  browser = await launch({'executablePath': exepath, 'headless': False, 'slowMo': 30})
  page = await browser.newPage()
  await page.setViewport({'width': 1366, 'height': 768})
  await page.goto('http://192.168.2.66')
  await page.type("#Login_Name_Input", "test02")
  await page.type("#Login_Password_Input", "12345678", )
  await page.waitFor(1000)
  await page.click("#Login_Login_Btn")
  await page.waitFor(3000)
  await browser.close()

asyncio.get_event_loop().run_until_complete(main())

 

 

import asyncio
import time
from pyppeteer import launch


async def gmailLogin(username, password, url):
  #'headless': False           False True
  # 127.0.0.1:1080   ip   ๏ผŒ               ๏ผŒ   vps            '--proxy-server=127.0.0.1:1080'
  browser = await launch({'headless': False, 'args': ['--no-sandbox', '--proxy-server=127.0.0.1:1080']})
  page = await browser.newPage()
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36')

  await page.goto(url)

  #   Gmail
  await page.type('#identifierId', username)
  #      
  await page.click('#identifierNext > content')
  page.mouse #       
  time.sleep(10)
  #   password
  await page.type('#password input', password)
  #      
  await page.click('#passwordNext > content > span')
  page.mouse #       
  time.sleep(10)
  #          DONE
  # await page.click('div > content > span')#         ๏ผŒ  page.setUserAgent             user-agent ๏ผŒ
  #            ๏ผŒ                  ๏ผŒ                    python      ใ€‚

  #       
  await page.screenshot({'path': './gmail-login.png', 'quality': 100, 'fullPage': True})
  #         ๏ผŒ Youtube  
  await page.goto('https://www.youtube.com')
  time.sleep(10)


if __name__ == '__main__':
  username = '  gmail  @gmail.com'
  password = r'  gmail  '
  url = 'https://gmail.com'
  loop = asyncio.get_event_loop()
  loop.run_until_complete(gmailLogin(username, password, url))
#         ๏ผŒ  www.sanfenzui.com๏ผŒ      ๏ผš
# https://blog.csdn.net/Chen_chong__/article/details/82950968

 

 

 

 

 

 

 

 

๋ฐ˜์‘ํ˜•

'Python > ์Šคํฌ๋กค๋ง' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Python] ํฌ๋กค๋ง Bookmarks  (0) 2022.07.24
[Python] Selenium  (0) 2022.05.04