点此免费加入Python网络爬虫学习交流QQ群:428518750

如果我们想获取页面的HTML源代码,那么可以在打开页面后,等待页面全部加载完成,再通过page.content()方法获取页面的源代码。

示例代码:

from playwright.sync_api import Playwright, sync_playwright, expect

def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://www.baidu.com/")
    
    page.wait_for_load_state('networkidle')
    print(page.content())
    
    page.wait_for_timeout(20000)
    
    page.close()
    context.close()
    browser.close()

with sync_playwright() as playwright:
    run(playwright)

在上面的代码中,使用了这么一句代码“page.wait_for_load_state(‘networkidle’)”。它是用于等待页面HTML加载完成。在需要对页面HTML中的元素进行操作时,都应该使用该方法等待页面HTML加载完成后再操作。

点此免费加入Python网络爬虫学习交流QQ群:428518750

picture loss