Imagine a long crawling process, like extracting data from a website for a whole month. We can start it and leave it running until we get the results. Though, we can agree that a whole month is plenty of time for something to go wrong. The target website can go down for a few minutes/hours, there can be some sort of power outage in your crawling server or even some other internet connection issues.

Any of those are real case scenarios and can happen at any given moment, bringing risk to your data extraction pipeline. In this case, if something…


We founded Scrapinghub 10 years ago to provide customers with a simple way to access web data. We’re taking this to the next level today with the release of Zyte Automatic Extraction and rebranding to reflect our evolution.

Background

Over the last decade, we have seen web data grow in importance to the point where it is now essential to all data-driven businesses. Advancements in data analytics and AI continue to drive demand for reliable high-quality web data.

Innovation has been at the heart of how we have addressed this challenge. We led the way with open source projects like…


What is Price Intelligence?

Price Intelligence is leveraging web data to make better pricing, marketing, and business decisions. Basically, it is all about making use of the available data to optimize your pricing strategy, making it more competitive, increasing profitability, and ultimately, improving your business performance.

From competitor monitoring to dynamic pricing and MAP monitoring, web extracted pricing data has endless uses. Brands and e-commerce companies use pricing data to monitor an overall view of the market. Dynamic pricing can be used to make automatic pricing decisions based on competitor’s data combined with internal data so that you always remain profitable. …


A reliable and scalable way to tap into blog comment driven insights

We are excited to announce our newest data extraction API. The Blog Comments API is now publicly available as a BETA release.

If you want to skip the introductions and just get stuck in, here are the links you need:

What does the Comments API Beta release achieve?

AutoExtract Comments API sets out to bring the power of our automatic data extraction capabilities currently used for applications such as media monitoring, job postings, and more into the arena of blog comment analysis.

The underlying data model for the API was released to production as part of the 20.6.0 …


In this blog post, you are going to learn what’s the main difference between data center proxies and residential proxies. When to use data center and residential proxies in your web data extraction project to maximize successful requests

Watch a video on data center vs residential proxies here: https://youtu.be/5ZVCbiythL4

Why would you even need proxies?

But, first let’s see why would you even need proxies. When you start extracting data from the web on a small scale you might not need proxies to make successful requests and get the data. But, as you scale your project because you need to extract more records or more frequently…


The web is complex and constantly changing. It is one of the reasons why web data extraction can be difficult, especially in the long term. It’s necessary to understand how a website works really well before you try to extract data. Luckily, there are lots of inspection and code tools available for this and in this article, we will show you some of our favorites.

These tools can be used for free and are available for all major platforms.

Developer tools

All major browsers come packed with a set of development tools. Although these have been built with the goal…


In this article, we give you some insight on how you can scale up your web data extraction project. You will learn what are the basic elements of scaling up and what are the steps that you should take when looking for the best rotating proxy solution.

Watch the video here: https://www.youtube.com/watch?v=1Dbs8G1M8l8&feature=emb_title

Generally, there are 3 steps needed to find the best proxy management method for your web scraping project and to make sure you can get data not just today but also in the future, long-term.

3 steps to scale up web scraping

1. Traffic profile

You need to define the traffic profile first to determine the concrete needs…


The web is complex and constantly changing. It is one of the reasons why web data extraction can be difficult, especially in the long term. It’s necessary to understand how a website works really well before you try to extract data. Luckily, there are lots of inspection and code tools available for this and in this article, we will show you some of our favorites.

These tools can be used for free and are available for all major platforms.

Developer tools

All major browsers come packed with a set of development tools. Although these have been built with the goal of building…


Hassle-Free, Structured, Machine-Readable
Job Postings Data

We are excited to announce our newest data extraction API. The Job Postings API is now out of BETA and publicly available as a stable release.

If you are ready to roll up your sleeves and get started, here are the links you need:

While this blog covers most of the notable improvements & extensive testing that the API has undergone, that warrants an exit from Beta, together with some high-level uses; it’s important to remember that we have already covered it extensively before.

What Has The Stable Release Of Job Posting API Solved

We are moving AutoExtract Job…


Article and news data extraction is becoming increasingly popular and widely used by companies. Data quality plays a vital role in making sure these projects succeed. If the quality of the extracted articles is not good enough, your whole business could be at risk, especially if it depends on the constant flow of high-quality article data.

Data quality enables your business to move data across your organization and transform it into something valuable for your users or customers. …

Zyte (formerly Scrapinghub)

Hi, we’re Zyte, the central point of entry for all your web data needs.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store