Saturday, October 5, 2024

How to Install and Use Browserless on Ubuntu 22.10!


This article guides you through the process of utilizing Browserless, a powerful browser automation solution, on the robust Ubuntu 22.04 operating system. We'll explore the advantages of Browserless, cover its setup on Ubuntu 22.04, and delve into practical examples showcasing its capabilities.

Understanding Browserless

Browserless is a game-changer in the realm of browser automation. It operates on a cloud-based infrastructure, empowering users to perform intricate browser tasks with unparalleled efficiency and scalability. From web crawling and testing to complex data collection, Browserless excels in streamlining these operations.

Why Choose Browserless?

  • Scalability: Leveraging cloud clusters, Browserless scales seamlessly to handle demanding tasks.

  • Fingerprint Management: Built upon Nstbrowser's extensive fingerprint library, Browserless offers randomized fingerprint switching, essential for navigating websites with stringent security measures and achieving realistic user behavior.

  • Simplified Automation: Browserless significantly simplifies browser automation by eliminating the need to manage and maintain local browser instances.

Getting Started: Setting Up Browserless on Ubuntu 22.04

Before diving into the exciting world of Browserless, we need to ensure our Ubuntu 22.04 system is primed for its deployment.

Prerequisites: Node.js and npm

  • Node.js: Ubuntu 22.04 comes with Node.js pre-installed, but an update might be required. We'll use the apt package manager for a smooth installation.

    1. Updating the package index: Keep your system's package information up-to-date:

            sudo apt update
          

    2. Installing Node.js: Install the latest version of Node.js:

            sudo apt install nodejs -y
          

  • npm (Node Package Manager): npm is crucial for managing Node.js modules and packages.

    1. Installing npm: Install npm using apt:

            sudo apt install npm -y
          

    2. Verification: Check if Node.js and npm are installed successfully:

      node -v
      npm -v
          

      You should see the respective versions printed if the installation is successful.

Building a Browserless Project

Now, we'll create a project to demonstrate Browserless' capabilities:

  1. Project Directory: Create a project directory and navigate to it:

          mkdir nst-browserless && cd nst-browserless
        

  2. Project Initialization: Initialize an npm project:

          npm init -y
        

  3. IDE Integration: Choose your preferred IDE for code editing (e.g., Visual Studio Code):

          code .
        

  4. Dependencies: Browserless currently supports Puppeteer and Playwright for browser automation. We'll utilize Puppeteer in this example. Install the necessary dependency:

          npm i --save puppeteer-core
        

Coding with Browserless: A Practical Example

Step 1: Accessing Browserless

  1. API Key and Proxy: Obtain your API key from the Nstbrowser Browserless documentation. Ensure you have a proxy configured for your requests.

  2. Code Implementation: Create a JavaScript file (e.g., browserless.js) and paste the following code:

    import puppeteer from "puppeteer-core";
    
    const token = "your token"; // Replace with your actual API key
    const config = {
        proxy: 'your proxy', // Replace with your proxy configuration
    };
    const query = new URLSearchParams({
        token: token,
        config: JSON.stringify(config),
    });
    const browserWSEndpoint = `https://less.nstbrowser.io/connect?${query.toString()}`;
    
    const getBrowser = async () => puppeteer.connect({
        browserWSEndpoint,
        defaultViewport: null,
    });
    
    (async () => {
        let browser = null;
        await getBrowser()
            .then(async (browser) => {
                const page = await browser.newPage();
                await page.goto("https://nstbrowser.io"); 
                await page.screenshot({ path: "screenshot.png", fullPage: true });
                await page.close();
                await browser.close();
            })
            .catch((error) => {
                console.log(error);
            })
            .finally(() => browser?.close());
    })();
        
  3. Execution: Run the script to test Browserless connectivity:

          node browserless.js
        

    If successful, a screenshot named screenshot.png will be generated in your project directory, confirming Browserless' successful integration.

Step 2: Image Scraping with Browserless

Let's put Browserless' power to work by scraping image addresses from a website. We'll use the Pixels website as an example.

  1. Target Website: We'll extract images from the "Wall Art" category on the Pixels website.

  2. Site Analysis: Examine the website structure to pinpoint the image elements we need:

    • Navigation: Find the link leading to the "Wall Art" category.

    • Image Location: Identify the HTML elements containing the image source (src) attributes we want to extract.

  3. Code Modification: Update your browserless.js file with the following code:

    import puppeteer from "puppeteer-core";
    
    const token = "your api token"; 
    const config = {
        proxy: 'your proxy', 
    };
    const query = new URLSearchParams({
        token: token, 
        config: JSON.stringify(config),
    });
    const browserWSEndpoint = `https://less.nstbrowser.io/connect?${query.toString()}`;
    
    const getBrowser = async () => puppeteer.connect({
        browserWSEndpoint,
        defaultViewport: null,
    });
    
    (async () => {
        let browser = null;
        const pixelsWebsite = "https://pixels.com";
        await getBrowser()
            .then(async (browser) => {
                const page = await browser.newPage();
                await page.goto(pixelsWebsite);
                await page.waitForSelector("#menuTopArt", { timeout: 30000 }); 
                await page.click("#menuTopArt a"); // Click 'Wall Art' menu
                await page.waitForSelector(".searchEngineFeaturedProductImage", { timeout: 30000 });
                const imageElements = await page.$$('.searchEngineFeaturedProductImage');
                for (const imageElement of imageElements) {
                    const src = await page.evaluate(el => el.src, imageElement);
                    if (src.includes("Blank.jpg")) { // Break if 'Blank.jgp' is encountered
                        break
                    }
                    console.log(src);
                    // Add further processing for the image if needed
                }
                await page.close();
                await browser.close();
            })
            .catch((error) => {
                console.log(error);
            })
            .finally(() => browser?.close());
    })();
        

  4. Execution: Run the script:

          node browserless.js
        

    You'll see the scraped image addresses printed in your console.

Conclusion

Browserless empowers you to automate browser tasks with ease, unlocking a world of possibilities on Ubuntu 22.04. We've explored its setup and demonstrated its capabilities through image scraping, showcasing its potential for data collection and web automation projects. With its user-friendly interface and robust functionalities, Browserless is an indispensable tool for developers and data enthusiasts alike.

0 comments:

Post a Comment