How to Bypass JavaScript-Rendered Content

In the world of web scraping, one of the most common challenges developers face is dealing with websites that heavily rely on JavaScript to render content. Traditional scraping tools like requests and lxml fall short when it comes to dynamic content. That’s where tools like Playwright come in. In this blog, we’ll explore how to use Playwright with Python to bypass JavaScript-rendered content and reliably extract data.

Why JavaScript-Rendered Content is Tricky

When a browser loads a website, the initial HTML may be almost empty, with JavaScript responsible for fetching and injecting content dynamically. If you use a basic HTTP client to fetch the page source, you often end up with skeleton HTML and no useful data.

Example:

				
					<div id="products"></div>
<script>
  fetch('/api/products').then(response => response.json()).then(data => { /* render products */ });
</script>

In this case, scraping the HTML won’t give you any product information unless the JavaScript is executed. That’s where browser automation tools help.

Introducing Playwright

Playwright is a powerful library developed by Microsoft for browser automation. It supports all modern rendering engines (Chromium, Firefox, WebKit) and works seamlessly with Python.

You can also read more about how Playwright compares to Selenium for different scraping use cases.

Why Playwright is a Game-Changer

It Actually Renders Pages – Unlike basic scrapers, Playwright loads the full page, including all the JavaScript-generated content.
You Can Interact with Pages – Need to click a “Load More” button? Fill out a form? Scroll infinitely? Playwright does it all.
Works Across Browsers – Chromium, Firefox, WebKit—pick your poison.
Runs Headless (Like a Ghost) – No need to open a visible browser window.

If you’ve used Selenium before, Playwright is like its cooler, faster cousin. Here’s a great comparison if you’re curious.

Getting Started: Installation & Setup

1. Install Playwright

First, grab the Python package:

				
					pip install playwright

Then, install the browsers it needs (this might take a minute):

				
					playwright install

2. Your First Playwright Script

Here’s a simple script to open a page and grab its fully rendered HTML:

				
					from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # Launch a browser (hidden by default)
    browser = p.chromium.launch(headless=True)  
    page = browser.new_page()
    
    # Visit a page with dynamic content
    page.goto("https://example.com/dynamic-stuff")
    
    # Wait for a specific element to load
    page.wait_for_selector("#content-loaded")  
    
    # Get the full HTML (after JavaScript runs)
    html = page.content()
    print(html)  # Now you see what a real user sees!
    
    browser.close()  # Don’t forget to clean up!

For more setup details, the official docs are super helpful.

Handling Dynamic Content Like a Boss

1. Waiting for Stuff to Load

Dynamic content can take a second (or five). Playwright lets you wait for it:

				
					# Wait for a specific element
page.wait_for_selector(".loaded-content")  

# Or wait until the page stops making network requests
page.wait_for_load_state("networkidle")

2. Clicking Buttons & Navigating

Some sites hide content behind interactions. Playwright can pretend to be a user:

				
					# Click a "Load More" button
page.click("button.load-more")  

# Wait for new content to appear
page.wait_for_selector(".fresh-data")

3. Extracting Data the Smart Way

Once the content is loaded, grab what you need:

				
					# Get text from an element
title = page.inner_text("h1")  

# Extract all links on the page
links = page.eval_on_selector_all(
    "a", 
    "elements => elements.map(el => el.href)"
)

4. Dealing with Infinite Scroll

Some sites (looking at you, social media) load content as you scroll. Here’s how to handle it:

				
					# Scroll to the bottom 3 times
for _ in range(3):
    page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
    page.wait_for_timeout(2000)  # Wait 2 secs for new content

Real-World Example: Scraping an Online Store

				
					async def scraper():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto("https://blinkit.com/cn/bath-body-essentials/cid/13/206")
        page.wait_for_selector("ProductsContainer__ProductListContainer-sc-1k8vkvc-0")
        products = page.query_selector_all('//div[contains(@class, "ProductsContainer__ProductListContainer-sc-1k8vkvc-0")]/a')  
        print(len(products))
        products_link = []
        for product in products:
            href = product.get_attribute('href')
            products_link.append(href)
        scrape_data(products_link)

For more real-world scraping techniques, this tutorial is gold.

Avoiding Getting Blocked

Websites don’t love scrapers, so here’s how to fly under the radar:

✅ Use headless mode (headless=True) – No one sees your browser.
✅ Add random delays – Don’t act like a robot.

				
					import random
page.wait_for_timeout(random.randint(1000, 4000))  # Wait 1-4 secs

✅ Rotate user agents – Pretend to be different browsers.
✅ Use proxies – Especially if scraping at scale. (Good proxy guide here)

Wrapping Up

Playwright is a powerhouse for scraping modern, JavaScript-heavy websites. It’s fast, flexible, and way easier than dealing with raw HTTP requests.

Where to Go Next

Try the async version for even faster scraping (guide here).
Learn how to log into sites (authentication tutorial).
Dive deeper with the official Playwright docs.

Now go forth and scrape! Just… maybe don’t abuse it, okay? 😉

How to Bypass JavaScript-Rendered Content with Playwright and Python