Web Scraping with ParseHub: An Example Project (2024)

Published in

Analytics Vidhya

·

6 min read

·

Jun 4, 2020

--

ParseHub is a free, easy-to-use web scraping tool. Once it is downloaded as an application onto a desktop, data can be extracted from websites for analysis.

Why use a web scraping tool, rather than manual methods of web scraping?

  • Speed. These tools are much faster than traditional web scraping methods, where inspecting the page for every command can grow tedious and difficult.
  • Visual, user-friendly interface. For instance, ParseHub brings up the website itself, highlighting the relevant place on the website for your reference when ParseHub is “seeing” the data. When selecting the elements on the web pages to be scraped, a box will appear over the text. This helps keep track of what’s going on in the program, rather than trying to keep track of long lines of code.
  • Easy to follow command list. Each command can be added to, modified as needed, and deleted. They can also be renamed, simply by clicking on the word and typing the new name.
  • Little to no coding knowledge is necessary. This is great for small businesses who may want web-scraped information without having to hire a professional, researchers who are trying to identify trends without much coding knowledge, or anyone who wants to obtain data to explore.
Web Scraping with ParseHub: An Example Project (3)

One of the many great things about ParseHub is the tutorial that automatically begins when you open the application for the first time after downloading it. This saves a first-time user the trouble of attempting to figure out the interface on their own, and provides a sample website to scrape data from as the walkthrough continues. Though the interface is fairly intuitive, some of the terms must be learned in order to successfully navigate different pages to scrape data. At the end of the tutorial, you can actually run the project, which will end up with data in an easy-to-download format.

Though there’s a lot to be learned from the tutorial, it’s often helpful to see an additional project walkthrough on a website that may be more realistic. The website for this walkthrough will be Bloomist, an e-commerce cite. Our goal is to extract products, links, prices, and reviews from each of the products, and organize them into a dataframe for later analysis.

Web Scraping with ParseHub: An Example Project (4)

The first step is to open a new project. After clicking on the new project button, ParseHub asks for the website that it will be scraping, and the URL can be copied in. In ParseHub’s main window, the website will appear.

Commands in ParseHub are very visual, and indicating what data to scrape from a website simply involves hovering over the element to be scraped and clicking on it. In order to make sure it is “seeing” the correct data, a pattern will have to be confirmed by the user, so ParseHub will then suggest a second element on the page that may fit the pattern. For instance, if I wanted to scrape all of the product names from the site, I would have to click on two product names before ParseHub recognizes the rest of the pattern. Once it does, green boxes will appear over everything that matches that pattern on the website.

Web Scraping with ParseHub: An Example Project (5)

From there, we would want to select a new element for the web scraper to identify. In our case, this would be price. It’s important that we link the product name and the price together so the data will be clear. ParseHub makes this easy with the Relative Select tool. First, one of the green product name boxes is clicked on, then the price below can be selected, which links those two elements. After this is done by hand twice, the pattern will again be identified, and it will appear throughout the entire page.

Web Scraping with ParseHub: An Example Project (6)

While this selecting process is going on, another visual tool is changing in the ParseHub interface, adding rows to a dataframe as different items are selected. ParseHub is able to automatically add links to the products we added to the dataframe, as well as including the price as we indicated earlier. This data preview is helpful, because it offers a glimpse of what the finished product, the data itself, might look like. However, only the first few rows are shown, unless a box is checked to include more data. Showing a preview of too much data can slow down the speed of the application, so sticking with the first few rows is usually a good idea.

Web Scraping with ParseHub: An Example Project (7)

Another thing we wanted to find out from this website was the number of reviews each product has. This isn’t something we can see on the main page of the website — it requires actually navigating to each product’s page and scrolling down to see the review number. A tedious task without a web scraping tool, but this is not hard for ParseHub to do!

The “Click” command will tell ParseHub to navigate to a new page and start a new template to gather data from that plate. On this page, creating commands works the same way as it did for the main page commands, and once again it automatically groups web pages with certain products.

Web Scraping with ParseHub: An Example Project (8)

Let’s say that there’s too many products to fit on one page, so the site has broken them up into multiple pages. We can still easily get the data from the next page by selecting a next page button, and indicating to continue the same commands on that new page of results. This means that a lot of products can be found with only one run.

Previous commands can be viewed and altered easily if any changes are necessary. They can also be renamed and re-organized. A full command list for a project may look something like this:

Web Scraping with ParseHub: An Example Project (9)

By indentation and color, it is clear exactly what ParseHub is collecting and in what order. It is also clear what page is being scraped. When a complete command list and data preview makes it seem like all the data will be found, a test run is a good way to see the scraping in action. To do this, click on the green “Get Data” button, and select “Test Run” from the options. This will highlight the commands as data is being collected, so if something goes wrong, it is easy to tell which command is the problem. It also ensures that you won’t have to take the time to run the scraper multiple times, especially because this process can take a while if there’s a lot of data.

After a test run has been done (or if you’re feeling bold), clicking on the “Run” button will run the program on ParseHub’s servers. Though it can take a few minutes to run completely, you then have the ability to get the website’s data in a CSV or JSON format and download it to your computer for further analysis.

Web Scraping with ParseHub: An Example Project (10)

Overall, this web scraping tool was easy to use and effective! For a visual person like myself, seeing what ParseHub was “seeing” in real time was very helpful in understanding the data structure and creating a working program.

Web Scraping with ParseHub: An Example Project (2024)

FAQs

Is web scraping ever illegal? ›

In the United States, for instance, web scraping can be considered legal as long as it does not infringe upon the Computer Fraud and Abuse Act (CFAA), the Digital Millennium Copyright Act (DMCA), or violate any terms of service agreements.

Is ParseHub worth it? ›

The tool is great for getting 10000's of data in minutes by collecting the data that you have asked the tool to collect. The web data extraction from Parsehub is extraordinary, it's a free tool which is one of the best things about the tool and it does everything that you expect it to do. Highly recommend this tool.

What is an example of web scraping? ›

Web scraping refers to the extraction of web data on to a format that is more useful for the user. For example, you might scrape product information from an ecommerce website onto an excel spreadsheet. Although web scraping can be done manually, in most cases, you might be better off using an automated tool.

Is ParseHub completely free? ›

ParseHub has both free and paid plans.

Can websites detect scrapers? ›

Web pages detect web crawlers and web scraping tools by checking their IP addresses, user agents, browser parameters, and general behavior. If the website finds it suspicious, you receive CAPTCHAs and then eventually your requests get blocked since your crawler is detected.

Can I get sued for scraping? ›

The Computer Fraud and Abuse Act (CFAA)

Under the CFAA, unauthorized web scraping could be considered a violation of the law. especially if it involves circumventing access controls or causes harm.

Does Google ban web scraping? ›

Google's terms and conditions clearly prohibit scraping their services, including search results. Violating these terms may lead to Google services blocking your IP address. However, Google does allow for some scraping, provided you do it in a way that respects its ToS, as well as the privacy and rights of others.

Can you get blocked for web scraping? ›

Most web scraping projects follow a specific pattern to extract data from the same website. This approach can result in anti-bot detection. For example, clicking the same elements, using the same scroll height, and following a similar navigation pattern for every request put you at risk of getting blocked.

What are the disadvantages of ParseHub? ›

Of course, if we're looking at the free version only, there are a few downsides: no private projects, slower data extraction, no Dropbox integration. But given the efficiency that ParseHub offers, I strongly believe that the pros outweigh the cons.

What browser does ParseHub use? ›

ParseHub is only compatible with websites that can be accessed via Firefox v54. Do you have any questions or need help with this tutorial? Submit a request or book a demo call.

Is ParseHub an API? ›

ParseHub's API enables you to programatically manage and run your projects and retrieve extracted data. The ParseHub API is designed around REST. It aims to have predictable URLs and uses HTTP verbs where possible. To the right, you can find sample code in a variety of languages.

Is web scraping for beginners? ›

You don't need to be a developer or a software engineer to complete this course, but basic programming knowledge is recommended. Don't be afraid, though. We explain everything in great detail in the course and provide external references that can help you level up your web scraping and web development skills.

Do hackers use web scraping? ›

So in summary - yes, hackers do sometimes use web scrapers as part of schemes to steal data. But ethical hackers and security researchers more often use scraping for good, with permission and within reason.

What data can be web scraped? ›

Let's check out some of these now!
  • Price Monitoring. Web Scraping can be used by companies to scrap the product data for their products and competing products as well to see how it impacts their pricing strategies. ...
  • Market Research. ...
  • 3. News Monitoring. ...
  • Sentiment Analysis. ...
  • Email Marketing.
Jul 16, 2024

How do I scrape specific data from a website? ›

The web scraping process
  1. Identify the target website.
  2. Collect URLs of the target pages.
  3. Make a request to these URLs to get the HTML of the page.
  4. Use locators to find the information in the HTML.
  5. Save the data in a JSON or CSV file or some other structured format.

Can you scrape data from any website legally? ›

So, is web scraping activity legal or not? It is not illegal as such. There are no specific laws prohibiting web scraping, and many companies employ it in legitimate ways to gain data-driven insights. However, there can be situations where other laws or regulations may come into play and make web scraping illegal.

Is it OK to scrape data from websites? ›

Is Web Scraping Legal? By its nature, web scraping is legal. In most cases, the data on a web page is available to the public, so you're not stealing information—you're just gathering the data anyone can access.

How do I scrape data from a website search results? ›

Scraping public Google search results with Python using our API
  1. Install required Python libraries. To follow this guide on scraping Google search results, you'll need the following: ...
  2. Set up a payload and send a POST request. Create a new file and enter the following code: ...
  3. Export scraped data to a CSV.
Oct 19, 2023

Top Articles
Mercantilism theory and examples - Economics Help
What Is Mercantilism?
Home Store On Summer
How To Pay My Big Lots Credit Card
Equipment Hypixel Skyblock
Ups Open Today Near Me
Roy12 Mods
Po Box 6726 Portland Or 97228
Ellaeats Tumblr
Ge Tracker Awakener Orb
How To Start Reading Usagi Yojimbo [Guide + Reading Order]
211475039
Rocky Bfb Asset
Www.burlingtonfreepress.com Obituaries
Overton Funeral Home Waterloo Iowa
Food Delivery Near Me Open Now Chinese
Secret Stars Sessions Julia
30+ useful Dutch apps for new expats in the Netherlands
PoE Reave Build 3.25 - Path of Exile: Settlers of Kalguur
‘There’s no Planet B’: UNLV first Nevada university to launch climate change plan
Sissy Hypno Gif
Best Birthday Dinner Los Angeles
Obsidian Guard's Skullsplitter
Haverhill, MA Obituaries | Driscoll Funeral Home and Cremation Service
Walgreens On Nacogdoches And O'connor
Sealy Posturepedic Carver 11 Firm
No hard feelings: cómo decir "no" en inglés educadamente y sin herir sensibilidades
David Mayries
Hotcopper Ixr
Craigslist Pinellas County Rentals
Best Upscale Restaurants In Denver
How Did Laura Get Narally Pregnant
Rennlist Com Forums
Osrs Desert Heat
Texas State Final Grades
Pressconnects Obituaries Recent
Z93 Local News Monticello Ky
C Spire Express Pay
Craigslist Hawley Pa
Craigslist Cars Merced Ca
Sallisaw Bin Store
Accuradio Unblocked
8 Common Things That are 7 Centimeters Long | Measuringly
Directions To 401 East Chestnut Street Louisville Kentucky
Sak Pase Rental Reviews
Jesus Calling December 1 2022
Ultimate Guide to Los Alamos, CA: A Small Town Big On Flavor
Swoop Amazon S3
Firsthealthmychart
Dumb Money Showtimes Near Cinema Cafe - Edinburgh
Redbox Walmart Near Me
What Does Code 898 Mean On Irs Transcript
Latest Posts
Article information

Author: Jonah Leffler

Last Updated:

Views: 5675

Rating: 4.4 / 5 (65 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Jonah Leffler

Birthday: 1997-10-27

Address: 8987 Kieth Ports, Luettgenland, CT 54657-9808

Phone: +2611128251586

Job: Mining Supervisor

Hobby: Worldbuilding, Electronics, Amateur radio, Skiing, Cycling, Jogging, Taxidermy

Introduction: My name is Jonah Leffler, I am a determined, faithful, outstanding, inexpensive, cheerful, determined, smiling person who loves writing and wants to share my knowledge and understanding with you.