Web Scraping with JavaScript and NodeJS | ScrapingBee

Meta tags

Meta Tag	Content
description	Learn web scraping with JavaScript and NodeJS with this step-by-step tutorial. We will see the different ways to scrape the web in JavaScript through lots of example.
viewport	width=device-width
Website Page URL	https://www.scrapingbee.com/blog/web-scraping-javascript/

Heading tags

h1 tag

We found around "1" h1 tags which are found in this page url and are available in the table below.

S.no	h1 tag content
1	Web Scraping with JavaScript and NodeJS

h2 tag

We found around "7" h2 tags which are found in this page url and are available in the table below.

S.no	h2 tag content
1	Understanding NodeJS: A brief introduction
2	HTTP clients: querying the web
3	Data Extraction in JavaScript
4	Headless Browsers in JavaScript
5	Summary
6	Resources
7	Tired of getting blocked while scraping the web?

h3 tag

We found around "15" h3 tags which are found in this page url and are available in the table below.

S.no	h3 tag content
1	Prerequisites
2	Outcomes
3	The JavaScript Event Loop
4	1. Built-In HTTP Client
5	2. Fetch API
6	3. Axios
7	4. SuperAgent
8	5. Request
9	Regular expressions: the hard way
10	Cheerio: Core jQuery for traversing the DOM
11	jsdom: the DOM for Node
12	1. Puppeteer: the headless browser
13	2. Nightmare: an alternative to Puppeteer
14	3. Playwright, the new web scraping framework
15	You might also like:

h4 tag

We found around "11" h4 tags which are found in this page url and are available in the table below.

S.no	h4 tag content
1	SuperAgent plugins
2	Using the Cheerio NPM Package for Web Scraping
3	Infinite Scroll with Puppeteer
4	Block ressources with Puppeteer
5	Company
6	Tools
7	Legal
8	Product
9	How we compare
10	No code web scraping
11	Learning Web Scraping

h5 tag

Unfortunately we were not able to find any h3 tag in the URL of this page.

h6 tag

Unfortunately we were not able to find any h3 tag in the URL of this page.

HTML Formatting Elements - Important text (strong/bold) tags

S.no	Tag content
1	Kevin Sahin \| 02 August 2022 (updated) \| 23 min read
2	Ryan Dahl introduced NodeJS in 2009
3	But enough of theory, let's check it out, shall we?
4	Not bad, two lines of code, no manual handling of data, no distinction between HTTP and HTTPS, and a native JSON object.
5	Should you use Request?
6	Proceed with caution please.
7	really
8	NodeJS
9	non-blocking
10	HTTP clients
11	Cheerio
12	JSDOM
13	Puppeteer
14	Nightmare
15	Check it out please.
16	Kevin Sahin
17	Ben Force
18	Maxine Meurer
19	Shadid Haque

HTML Formatting Elements - Important text (i) tags

S.no	Tag content

HTML Formatting Elements - Underline text (u) tags

S.no	Tag content

HTML Formatting Elements - Code tags


document
window
const http = require('http');
const PORT = 3000;

const server = http.createServer((req, res) => {
  res.statusCode = 200;
  res.setHeader('Content-Type', 'text/plain');
  res.end('Hello World');
});

server.listen(port, () => {
  console.log(`Server running at PORT:${port}/`);
});
require
createServer
listen
accept
while (true);
MyServer.js
node MyServer.js
const http = require('http');

const req = http.request('http://example.com', res => {
	const data = [];

	res.on('data', _ => data.push(_))
	res.on('end', () => console.log(data.join()))
});

req.end();
fetch()
async function fetch_demo()
{
	const resp = await fetch('https://www.reddit.com/r/programming.json');

	console.log(await resp.json());
}

fetch_demo();
await
json()
fetch
POST
npm install axios
const axios = require('axios')

axios
	.get('https://www.reddit.com/r/programming.json')
	.then((response) => {
		console.log(response)
	})
	.catch((error) => {
		console.error(error)
	});
async function getForum() {
	try {
		const response = await axios.get(
			'https://www.reddit.com/r/programming.json'
		)
		console.log(response)
	} catch (error) {
		console.error(error)
	}
}
getForum
const superagent = require("superagent")
const forumURL = "https://www.reddit.com/r/programming.json"

// callbacks
superagent
	.get(forumURL)
	.end((error, response) => {
		console.log(response)
	})

// promises
superagent
	.get(forumURL)
	.then((response) => {
		console.log(response)
	})
	.catch((error) => {
		console.error(error)
	})

// promises with async/await
async function getForum() {
	try {
		const response = await superagent.get(forumURL)
		console.log(response)
	} catch (error) {
		console.error(error)
	}
}
npm install superagent
const request = require('request')
request('https://www.reddit.com/r/programming.json', function (
  error,
  response,
  body
) {
  console.error('error:', error)
  console.log('body:', body)
})
npm install request
const htmlString = '<label>Username: John Doe</label>'
const result = htmlString.match(/<label>Username: (.+)<\/label>/)

console.log(result[1])
// John Doe
String.match()
(.+)
result[1]
<label>
const cheerio = require('cheerio')
const $ = cheerio.load('<h2 class="title">Hello world</h2>')

$('h2.title').text('Hello there!')
$('h2').addClass('welcome')

$.html()
// <h2 class="title welcome">Hello there!</h2>
onClick
npm install cheerio axios
crawler.js
const axios = require('axios');
const cheerio = require('cheerio');

const getPostTitles = async () => {
	try {
		const { data } = await axios.get(
			'https://old.reddit.com/r/programming/'
		);
		const $ = cheerio.load(data);
		const postTitles = [];

		$('div > p.title > a').each((_idx, el) => {
			const postTitle = $(el).text()
			postTitles.push(postTitle)
		});

		return postTitles;
	} catch (error) {
		throw error;
	}
};

getPostTitles()
    .then((postTitles) => console.log(postTitles));
getPostTitles()
cheerio.load()
$
Inspect
div > p.title > a
$('div > p.title > a')
each()
text()
node crawler.js
const { JSDOM } = require('jsdom')
const { document } = new JSDOM(
	'<h2 class="title">Hello world</h2>'
).window

const heading = document.querySelector('.title')
heading.textContent = 'Hello there!'
heading.classList.add('welcome')

heading.innerHTML
// <h2 class="title welcome">Hello there!</h2>
querySelector()
<div>
const { JSDOM } = require("jsdom")

const HTML = `
	<html>
		<body>
			<button onclick="const e = document.createElement('div'); e.id = 'myid'; this.parentNode.appendChild(e);">Click me</button>
		</body>
	</html>`;

const dom = new JSDOM(HTML, {
	runScripts: "dangerously",
	resources: "usable"
});

const document = dom.window.document;

const button = document.querySelector('button');

console.log("Element before click: " + document.querySelector('div#myid'));
button.click();
console.log("Element after click: " + document.querySelector('div#myid'));
require()
HTML
runScripts
click()
Element before click: null
Element after click: [object HTMLDivElement]
resources
npm install puppeteer
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD
const puppeteer = require('puppeteer')

async function getVisual() {
	try {
		const URL = 'https://www.reddit.com/r/programming/'
		const browser = await puppeteer.launch()

		const page = await browser.newPage()
		await page.goto(URL)

		await page.screenshot({ path: 'screenshot.png' })
		await page.pdf({ path: 'page.pdf' })

		await browser.close()
	} catch (error) {
		console.error(error)
	}
}

getVisual()
getVisual()
puppeteer.launch()
newPage()
goto()
goto
screenshot()
pdf()
close()
npm install nightmare
const Nightmare = require('nightmare')
const nightmare = Nightmare()

nightmare
	.goto('https://search.brave.com/')
	.type('#searchbox', 'ScrapingBee')
	.click('#submit-button')
	.wait('#results a')
	.evaluate(
		() => document.querySelector('#results a').href
	)
	.end()
	.then((link) => {
		console.log('ScrapingBee Web Link:', link)
	})
	.catch((error) => {
		console.error('Search failed:', error)
	})
nightmare
type
#searchbox
click
#submit-button
wait
evaluate()
<a>
href
end()
ScrapingBee Web Link: https://www.scrapingbee.com/
const playwright = require('playwright');
async function main() {
    const browser = await playwright.chromium.launch({
        headless: false // setting this to true will not run the UI
    });

    const page = await browser.newPage();
    await page.goto('https://finance.yahoo.com/world-indices');
    await page.waitForTimeout(5000); // wait for 5 seconds
    await browser.close();
}

main();

S.no	Tag content
1	`document`
2	`window`
3	const http = require('http'); const PORT = 3000; const server = http.createServer((req, res) => { res.statusCode = 200; res.setHeader('Content-Type', 'text/plain'); res.end('Hello World'); }); server.listen(port, () => { console.log(`Server running at PORT:${port}/`); });
4	`require`
5	`createServer`
6	`listen`
7	`accept`
8	`while (true);`
9	`MyServer.js`
10	`node MyServer.js`
11	`const http = require('http'); const req = http.request('http://example.com', res => { const data = []; res.on('data', _ => data.push(_)) res.on('end', () => console.log(data.join())) }); req.end();`
12	`fetch()`
13	`async function fetch_demo() { const resp = await fetch('https://www.reddit.com/r/programming.json'); console.log(await resp.json()); } fetch_demo();`
14	`await`
15	`json()`
16	`fetch`
17	`POST`
18	`npm install axios`
19	`const axios = require('axios') axios .get('https://www.reddit.com/r/programming.json') .then((response) => { console.log(response) }) .catch((error) => { console.error(error) });`
20	`async function getForum() { try { const response = await axios.get( 'https://www.reddit.com/r/programming.json' ) console.log(response) } catch (error) { console.error(error) } }`
21	`getForum`
22	`const superagent = require("superagent") const forumURL = "https://www.reddit.com/r/programming.json" // callbacks superagent .get(forumURL) .end((error, response) => { console.log(response) }) // promises superagent .get(forumURL) .then((response) => { console.log(response) }) .catch((error) => { console.error(error) }) // promises with async/await async function getForum() { try { const response = await superagent.get(forumURL) console.log(response) } catch (error) { console.error(error) } }`
23	`npm install superagent`
24	`const request = require('request') request('https://www.reddit.com/r/programming.json', function ( error, response, body ) { console.error('error:', error) console.log('body:', body) })`
25	`npm install request`
26	`const htmlString = '<label>Username: John Doe</label>' const result = htmlString.match(/<label>Username: (.+)<\/label>/) console.log(result[1]) // John Doe`
27	`String.match()`
28	`(.+)`
29	`result[1]`
30	`<label>`
31	`const cheerio = require('cheerio') const $ = cheerio.load('<h2 class="title">Hello world</h2>') $('h2.title').text('Hello there!') $('h2').addClass('welcome') $.html() // <h2 class="title welcome">Hello there!</h2>`
32	`onClick`
33	`npm install cheerio axios`
34	`crawler.js`
35	`const axios = require('axios'); const cheerio = require('cheerio'); const getPostTitles = async () => { try { const { data } = await axios.get( 'https://old.reddit.com/r/programming/' ); const $ = cheerio.load(data); const postTitles = []; $('div > p.title > a').each((_idx, el) => { const postTitle = $(el).text() postTitles.push(postTitle) }); return postTitles; } catch (error) { throw error; } }; getPostTitles() .then((postTitles) => console.log(postTitles));`
36	`getPostTitles()`
37	`cheerio.load()`
38	`$`
39	`Inspect`
40	`div > p.title > a`
41	`$('div > p.title > a')`
42	`each()`
43	`text()`
44	`node crawler.js`
45	`const { JSDOM } = require('jsdom') const { document } = new JSDOM( '<h2 class="title">Hello world</h2>' ).window const heading = document.querySelector('.title') heading.textContent = 'Hello there!' heading.classList.add('welcome') heading.innerHTML // <h2 class="title welcome">Hello there!</h2>`
46	`querySelector()`
47	`<div>`
48	const { JSDOM } = require("jsdom") const HTML = ` <html> <body> <button onclick="const e = document.createElement('div'); e.id = 'myid'; this.parentNode.appendChild(e);">Click me</button> </body> </html>`; const dom = new JSDOM(HTML, { runScripts: "dangerously", resources: "usable" }); const document = dom.window.document; const button = document.querySelector('button'); console.log("Element before click: " + document.querySelector('div#myid')); button.click(); console.log("Element after click: " + document.querySelector('div#myid'));
49	`require()`
50	`HTML`
51	`runScripts`
52	`click()`
53	`Element before click: null Element after click: [object HTMLDivElement]`
54	`resources`
55	`npm install puppeteer`
56	`PUPPETEER_SKIP_CHROMIUM_DOWNLOAD`
57	`const puppeteer = require('puppeteer') async function getVisual() { try { const URL = 'https://www.reddit.com/r/programming/' const browser = await puppeteer.launch() const page = await browser.newPage() await page.goto(URL) await page.screenshot({ path: 'screenshot.png' }) await page.pdf({ path: 'page.pdf' }) await browser.close() } catch (error) { console.error(error) } } getVisual()`
58	`getVisual()`
59	`puppeteer.launch()`
60	`newPage()`
61	`goto()`
62	`goto`
63	`screenshot()`
64	`pdf()`
65	`close()`
66	`npm install nightmare`
67	`const Nightmare = require('nightmare') const nightmare = Nightmare() nightmare .goto('https://search.brave.com/') .type('#searchbox', 'ScrapingBee') .click('#submit-button') .wait('#results a') .evaluate( () => document.querySelector('#results a').href ) .end() .then((link) => { console.log('ScrapingBee Web Link:', link) }) .catch((error) => { console.error('Search failed:', error) })`
68	`nightmare`
69	`type`
70	`#searchbox`
71	`click`
72	`#submit-button`
73	`wait`
74	`evaluate()`
75	`<a>`
76	`href`
77	`end()`
78	`ScrapingBee Web Link: https://www.scrapingbee.com/`
79	`const playwright = require('playwright'); async function main() { const browser = await playwright.chromium.launch({ headless: false // setting this to true will not run the UI }); const page = await browser.newPage(); await page.goto('https://finance.yahoo.com/world-indices'); await page.waitForTimeout(5000); // wait for 5 seconds await browser.close(); } main();`

The Anchor element (a) tags

S.no	Anchor tag Content
1	Login
2	Sign Up
3	Pricing
4	FAQ
5	Blog
6	Other Features
7	Screenshots
8	Google search API
9	Data extraction
10	JavaScript scenario
11	No code web scraping
12	Developers
13	Tutorials
14	Documentation
15	Knowledge Base
16	Try ScrapingBee for Free
17	Event Loop
18	callback functions
19	http://localhost:3000
20	built-in HTTP client
21	separate library for HTTPS URLs
22	Fetch API
23	version 18
24	node-fetch
25	article on node-fetch
26	Promises
27	await
28	json() function
29	Response object
30	options argument
31	Github
32	GitHub
33	plugins
34	superagent-throttle
35	Request
36	wrapper libraries
37	String.match()
38	Cheerio
39	Single-page applications
40	Headless Browsers in JavaScript
41	r/programming
42	knowledge on XPath
43	NodeJS Axios proxy
44	jsdom
45	querySelector()
46	jsdom's documentation
47	here
48	SPAs
49	Puppeteer
50	Source
51	Puppeteer environment variables
52	How to download a file with Puppeteer
53	Handling and submitting HTML forms with Puppeteer
54	Using Puppeteer with Python and Pyppeteer
55	Nightmare
56	https://search.brave.com
57	type
58	click
59	wait
60	evaluate()
61	end()
62	https://www.scrapingbee.com
63	Playwright tutorial
64	guide on how not to get blocked as a crawler
65	scraping API platform
66	NodeJS Website
67	Puppeteer's Docs
68	Playright
69	ScrapingBee's Blog
70	Handling infinite scroll with Puppeteer
71	Node-unblocker
72	A Javascript developer's guide to cURL
73	ScrapingBee
74	Using the Cheerio NPM Package for Web Scraping Ben Force 9 min read In this article, you'll learn how to use Cheerio to scrape data from static HTML content.
75	Infinite Scroll with Puppeteer Maxine Meurer 10 min read Infinite page are everywhere. This article will teach you to scroll infinite pages with Puppeteer. We will also see the alternative methods for scraping infinite pages.
76	Block ressources with Puppeteer Shadid Haque 5 min read This article will show you how to intercept and block requests with Puppeteer using the request interception API and the puppeteer extra plugin.
77	Team
78	Company's journey
79	Rebranding
80	Affiliate Program
81	Curl converter
82	Terms of Service
83	Privacy Policy
84	GDPR Compliance
85	Data Processing Agreement
86	Features
87	Status
88	Alternative to Crawlera
89	Alternative to Luminati
90	Alternative to Smartproxy
91	Alternative to NetNut
92	Alternative to ScraperAPI
93	Alternatives to ScrapingBee
94	No code competitor monitoring
95	How to put scraped website data into Google Sheets
96	Send stock prices update to Slack
97	Scrape Amazon products' price with no code
98	Extract job listings, details and salaries
99	Web scraping questions
100	A guide to Web Scraping without getting blocked
101	Web Scraping Tools
102	Best Free Proxies
103	Best Mobile proxies
104	Web Scraping vs Web Crawling
105	Rotating and residential proxies
106	Web Scraping with Python
107	Web Scraping with PHP
108	Web Scraping with Java
109	Web Scraping with Ruby
110	Web Scraping with NodeJS
111	Web Scraping with R
112	Web Scraping with C#
113	Web Scraping with C++
114	Web Scraping with Elixir
115	Web Scraping with Perl
116	Web Scraping with Rust
117	Web Scraping with Go

Contact Us

If you have any inquiries or feedback, please don't hesitate to reach out to us at [email protected]. We will respond to your request as soon as possible. Thank you very much for your interest!