Build a Web Scraper with Node.js and cheerio – IMDB Movie Search



Show starts at 0:48 ? See the description below for more timestamps.
Server code is here: https://github.com/w3cj/imdb-scraper-server
Client code is here: https://github.com/w3cj/imdb-scraper-client

– [1:12] Introduction
– Set up server folder
– [3:42] initilize a node app
– [3:57] npm install node-fetch
– [4:26] How to use node-fetch
– Parsing HTML with cheerio
– [6:49] Overview of cheerio
– [7:18] Add cheerio to server code
– [7:39] Get movie titles and movie posters from IMDB search :tada:
– Start using express
– [11:53] Format data as JSON
– [13:04] Prepare code for modularity
– [14:12] Create a basic express app
– [15:39] Create node scripts (auto-reload with nodemon)
– Building an API in express
– [16:31] Create search route
– [18:02] Get movie id from IMDB
– [20:12] Create movie route
– [23:19] Get data from IMDB movie page
– [23:25] Title
– [26:39] MPAA Rating
– [27:59] Run time
– [29:58] Genres
– [31:45] Release date
– [32:50] IMDB rating
– [33:40] Movie poster
– [37:04] Summary
– [38:03] Directors
– [43:24] Writers
– [46:52] Actors
– [49:32] Story line
– [51:50] Things to keep in mind when scraping the web
– [53:10] Back to getting data from IMDB
– [53:10] Try getting budget
– [54:09] Production companies
– [57:46] Link to trailer (Part I)
– [1:00:49] OUTDATED suggest videos via pollly (view top of description)
– [1:01:21] Link to trailer (Part II)
– [1:03:40] Add caching
– [1:07:16] Deployment via now
– Create Frontend in Vanilla JS
– [1:09:45] Add CORS to server code
– [1:11:11] Create client folder
– [1:11:42] Add Bootswatch CDN
– [1:12:21] Start styling
– [1:15:50] Add search logic
– [1:18:58] Show search results on page
– [1:23:04] Create movie page
– [1:41:52] Format date with date-fns
– [1:46:04] Review of what we have built today!
– [1:51:42] OUTDATED. Please use https://poll.coding.garden/
Search for more Coding Garden videos here: https://cg-videos.now.sh
View the Coding Garden FAQ here: https://github.com/CodingGardenCommunity/app-wiki/wiki/Frequently-Asked-Questions

Related Posts
iOS How to Specify Size Class Specific Constraints

You'll learn how to specify size class specific constraints in this video. For this particular demo, we're doing it for Read more

Let's Build for Google App Engine – Light Grid #3 – Grid Clicking!

In this episode of Let's Build we build the Grid Clicking functionality for our Light Grid, which will allow everyone Read more

Building a Java application with TypeScript and React pt1

In "Building a Java application with TypeScript and React" I show you how to build a Java application using Spring Read more

23 thoughts on “Build a Web Scraper with Node.js and cheerio – IMDB Movie Search”

  1. How Can we Scrape Amazon or Walmart or Aliexpress or even Alibab, it's just complex, so is there any way to such staff ?!

    Reply
  2. Hello first of all very nice tutorial!!
    i have a problem with my side i try to choose specific tr like you with id and after i want to take td with id inside of tr and to take the data and i can not… maybe do you have some idea?
    Example:
    <tr class="event-row printed-event res1 extra-results" >
    <td class="here">data</td>
    </tr>

    Reply
  3. hi. I really need you help. can you show me how to scrape paginated web pages and upload the data by dividing into pagination, please?

    Reply
  4. can't get the genres clean because they've removed the itemprop attributes now. awesome tutorial though! thanks a lot, CJ!

    here's my workaround for it:

    const genres = [];
    $('.subtext a').each(function(i, element) {
    const genre = $(element).text().trim();
    genres.push(genre);

    });

    // get last item (date) from genres array and store it in datePublished
    const datePublished = genres.pop();

    Reply
  5. Hey CJ!! Thanks for the great tutorial!!! I'd like to connect with you on LinkedIn or Twitter – how does one actually connect with you???

    Reply
  6. Can I personally message you regarding something I am doing related to web scraping, I tried out your video got almost 75% completed facing some errors but I need help in scraping this if you can message ?
    How can I get the data I need and put it onto the csv file ?

    The output will be a csv file having a product list scraped from a specific website for example let's say Agrostar in my case https://www.agrostar.in/ (any single category. For example, "Seeds"). The output list will have the product name, price, brand and any other attributes in the Product Highlights. The corresponding images will be in another folder with a custom file name corresponding to the row in the previously generated csv file.

    Pick 1 category, list out all the products and it’s attributes on the csv.

    Reply
  7. Please help me with this error. I'm at 6:35 of the video

    .then(body => {
    ^

    TypeError: Cannot read property 'then' of undefined
    at Object.<anonymous> (E:ai1cmdera-scraperindex.js:11:2)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
    at Function.Module._load (internal/modules/cjs/loader.js:530:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
    at startup (internal/bootstrap/node.js:266:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:596:3)

    Reply
  8. So I was trying to run this in my localhost apache setup in /library/webserver/documents/js/scraper and when I try to do the fetch I get a
    (node:9860) UnhandledPromiseRejectionWarning: TypeError: Only absolute URLs are supported is this due to the fact that I'm trying to run it from the webserver??

    Reply
  9. hey man. I have a little problem. my scraper.searchMovies function is not working. I have followed your steps and yet I am getting this error. Any idea why? and thanks for your awesome video.

    Reply
  10. – [1:12] Introduction
    – Set up server folder
    – [3:42] initilize a node app
    – [3:57] npm install node-fetch
    – [4:26] How to use node-fetch
    – Parsing HTML with cheerio
    – [6:49] Overview of cheerio
    – [7:18] Add cheerio to server code
    – [7:39] Get movie titles and movie posters from IMDB search :tada:
    – Start using express
    – [11:53] Format data as JSON
    – [13:04] Prepare code for modularity
    – [14:12] Create a basic express app
    – [15:39] Create node scripts (auto-reload with nodemon)
    – Building an API in express
    – [16:31] Create search route
    – [18:02] Get movie id from IMDB
    – [20:12] Create movie route
    – [23:19] Get data from IMDB movie page
    – [23:25] Title
    – [26:39] MPAA Rating
    – [27:59] Run time
    – [29:58] Genres
    – [31:45] Release date
    – [32:50] IMDB rating
    – [33:40] Movie poster
    – [37:04] Summary
    – [38:03] Directors
    – [43:24] Writers
    – [46:52] Actors
    – [49:32] Story line
    – [51:50] Things to keep in mind when scraping the web
    – [53:10] Back to getting data from IMDB
    – [53:10] Try getting budget
    – [54:09] Production companies
    – [57:46] Link to trailer (Part I)
    – [1:00:49] This is OUTDATED. Please use https://poll.coding.garden/
    – [1:01:21] Link to trailer (Part II)
    – [1:03:40] Add caching
    – [1:07:16] Deployment via now
    – Create Frontend in Vanilla JS
    – [1:09:45] Add CORS to server code
    – [1:11:11] Create client folder
    – [1:11:42] Add Bootswatch CDN
    – [1:12:21] Start styling
    – [1:15:50] Add search logic
    – [1:18:58] Show search results on page
    – [1:23:04] Create movie page
    – [1:41:52] Format date with date-fns
    – [1:46:04] Review of what we have built today!
    – [1:51:42] This is OUTDATED. Please use https://poll.coding.garden/

    Contributed by: https://github.com/sbibow
    Contribute here: https://github.com/CodingGarden/community-contributions/blob/master/outlines-timestamps/U0btOGPwrIY.md

    Reply
  11. Hi man!
    Nice tutorial but I get stuck at the part where you open the directory in atom…. I downloaded atom but something is weird when I try to open the IDE it doesnt show up
    And I dont know how to open that directory in another IDE, I also have VS code but dont know how to open the index.js in them.
    Yes I have tried googling my problem without any solution.
    Would appreciate some help.
    Thanks

    Reply
  12. Failed to load https://imdb-scrprr.now.sh/search/star%20wars: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:3000' is therefore not allowed access. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled. i found this error when i fetch data for the client please how can i solve it ?

    Reply

Leave a Comment