How to Use Machine Learning for SEO Competitor Research

  • Share

With the ever-increasing urge for food of SEO professionals to be taught Python, there’s by no means been a greater or extra thrilling time to reap the benefits of machine studying’s (ML) capabilities and apply these to SEO.

This is particularly true in your competitor analysis.

In this column, you’ll learn the way machine studying helps tackle widespread challenges in SEO competitor analysis, how to arrange and prepare your ML mannequin, how to automate your evaluation, and extra.

Let’s do that!

Why We Need Machine Learning in SEO Competitor Research

Most if not all SEO execs working in aggressive markets will analyze the SERPs and their enterprise rivals to discover out what it’s their website is doing to obtain the next rank.

Back in 2003, we used spreadsheets to accumulate knowledge from SERPs, with columns representing completely different features of the competitors such because the variety of hyperlinks to the house web page, variety of pages, and so on.

In hindsight, the thought was proper however the execution was hopeless due to the constraints of Excel in performing a statistically sturdy evaluation within the brief time required.


Continue Reading Below

And if the boundaries of spreadsheets weren’t sufficient, the panorama has moved on fairly a bit since then as we now have:

  • Mobile SERPs.
  • Social media.
  • A way more refined Google Search expertise.
  • Page Speed.
  • Personalized search.
  • Schema.
  • Javascript frameworks and different new net applied sciences.

The above is in no way an exhaustive checklist of tendencies however serves to illustrate the ever-increasing vary of things that may clarify the benefit of your higher-ranked rivals in Google.

Machine Learning within the SEO Context

Thankfully, with instruments like Python/R, we’re now not topic to the boundaries of spreadsheets. Python/R can deal with thousands and thousands to billions of rows of information.

If something, the restrict is the standard of information you possibly can feed into your ML mannequin and the clever questions you ask of your knowledge.

As an SEO skilled, you can also make the decisive distinction to your SEO marketing campaign by chopping by way of the noise and utilizing machine studying on competitor knowledge to uncover:


Continue Reading Below

  • Which rating elements can greatest clarify the variations in rankings between websites.
  • What the profitable benchmark is.
  • How a lot a unit change within the issue is price when it comes to rank.

Like any (knowledge) science endeavor, there are a variety of questions to be answered earlier than we will begin coding.

What Type of ML Problem is Competitor Analysis?

ML solves quite a lot of issues whether or not it’s categorizing issues (classification) or predicting a steady quantity (regression).

In our specific case, because the high quality of a competitor’s SEO is denoted by its rank in Google, and that rank is a steady quantity, then the ML drawback is one in every of regression.

Outcome Metric

Given that we all know the ML drawback is one in every of regression, the result metric is rank. This is smart for quite a lot of causes:

  • Rank received’t undergo from seasonality; an ice cream model’s rankings for searches on [ice cream] received’t depreciate as a result of it’s winter, in contrast to the “users” metric.
  • Competitor rank is third-party knowledge and is on the market utilizing industrial SEO instruments, in contrast to their consumer visitors and conversions.

What Are the Features?

Knowing the result metric, we should now decide the impartial variables or mannequin inputs also referred to as options. The knowledge varieties for the characteristic will range, for instance:

  • First paint measured in seconds could be a numeric.
  • Sentiment with the classes constructive, impartial, and detrimental could be an element.

Naturally, you need to cowl as many significant options as potential together with technical, content material/UX, and offsite for probably the most complete competitor analysis.

What Is the Math?

Given that rankings are numeric, and that we wish to clarify the distinction in rank, then in mathematical phrases:

rank ~ w_1*feature_1 + w_2*feature_2 + … + w_n*feature_n

~ (generally known as the “tilde”) means “explained by”

n being the nth characteristic

w is the weighting of the characteristic

Using Machine Learning to Uncover Competitor Secrets

With the solutions to these questions in hand, we’re prepared to see what secrets and techniques machine studying can reveal about your competitors.

At this level, we are going to assume that your knowledge (recognized on this instance as “serps_data”) has been joined, reworked, cleaned, and is now prepared for modeling.


Continue Reading Below

As a minimal, this knowledge will include the Google rank and have knowledge you need to take a look at.

For instance, your columns might embrace:

  • Google_rank.
  • Page_speed.
  • Sentiment.
  • Flesch_kincaid_reading_ease.
  • Amp_version_available.
  • Site_depth.
  • Internal_page_rank.
  • Referring_comains rely.
  • avg_domain_authority_backlinks.
  • title_keyword_string_distance.

Training Your ML Model

To prepare your mannequin, we’re utilizing XGBoost as a result of it tends to ship higher outcomes than different ML fashions.

Alternatives you could want to trial in parallel are LightGBM (particularly for a lot bigger datasets), RandomForest, and Adaboost.

Try utilizing the next Python code for XGBoost for your SERPs dataset:

# import the libraries

import xgboost as xgb

import pandas as pd

serps_data = pd.read_csv('serps_data.csv')

# set the mannequin variables

# your SERPs knowledge with the whole lot however the google_rank column

serp_features = serps_data.drop(columns = ['Google_rank'])

# your SERPs knowledge with simply the google_rank column

rank_actual = serps_data.Google_rank

# Instantiate the mannequin

serps_model = xgb.XGBRegressor(goal="reg:linear", random_state=1231)

# match the mannequin

serps_model.match(serp_features, rank_actual)

# generate the mannequin predictions

rank_pred = serps_model.predict(serp_features)

# consider the mannequin accuracy

mse = mean_squared_error(rank_actual, rank_pred)

Note that the above may be very fundamental. In an actual shopper state of affairs, you’d need to trial quite a lot of mannequin algorithms on a coaching knowledge pattern (about 80% of the information), consider (utilizing the remaining 20% knowledge), and choose the very best mannequin.


Continue Reading Below

So what secrets and techniques can this machine studying mannequin inform us?

The Most Predictive Drivers of Rank

The chart reveals probably the most influential SERP options or rating elements in descending order of significance.

Most influential SERP features or ranking factors in order of importance.

Most influential SERP features or ranking factors in order of importance.

In this specific case, an important issue was “title_keyword_dist” which measures the string distance between the title tag and the goal key phrase. Think of this because the title tag’s relevance to the key phrase.


Continue Reading Below

No shock there for the SEO practitioner, nevertheless, the worth right here is offering empirical proof to the non-expert enterprise viewers that doesn’t perceive the necessity to optimize title tags.

Other elements of notice on this trade are:

  • no_cookies: The variety of cookies.
  • dom_ready_time_ms: A measure of web page pace.
  • no_template_words: Counts the variety of phrases outdoors the principle physique content material part.
  • link_root_domains_links: Count of hyperlinks to root domains.
  • no_scaled_images: Count of pictures scaled that want scaling by the browser to render.

Every market or trade is completely different, so the above is just not a basic outcome for the entire of SEO!

How Much Rank a Ranking Factor Is Worth

In one other market case, we will additionally see how a lot rank will likely be delivered.

Forecast rank change.

Forecast rank change.

In the chart above, now we have an inventory of things and the rank change for each constructive unit change in that issue.


Continue Reading Below

For instance, for each unit improve in meta description size by 1 character, there’s a corresponding lower in Google rank of 0.1.

Taken out of context, this sounds ridiculous. However, given that almost all meta descriptions are populated it will imply {that a} unit change away from the typical meta description size would then lead to a lower in Google Search rating.

The Winning Benchmark for a Ranking Factor

Below is a graph plotting the typical title tag size for a special trade to the one above, which additionally features a line of greatest match:

Graph plotting the average title tag length.

Graph plotting the average title tag length.

Despite the very best observe SEO advice of utilizing up to 70 characters for title tag size, the information plotted above reveals the precise optimum size on this trade to be 60 characters.


Continue Reading Below

Thanks to machine studying, we’re not solely ready to floor an important elements however when taking a deep dive may also see the profitable benchmark.

Automating Your SEO Competitor Analysis with Machine Learning

The above software of machine studying is nice for getting some concepts to break up AB take a look at and enhance the SEO program with evidence-driven change requests.

It’s additionally essential to acknowledge that this evaluation is made all of the extra highly effective when it’s ongoing.


Because the ML evaluation is only a snapshot of the SERPs for a single time limit.

Having a steady stream of information assortment and evaluation means you get a more true image of what’s actually taking place with the SERPs for your trade.

This is the place SEO purpose-built knowledge warehouse and dashboard programs come in useful, and these merchandise can be found at the moment.

What these programs do is:

  • Ingest your knowledge out of your favourite SEO instruments day by day.
  • Combine the information.
  • Use ML to floor insights like to above in a entrance finish of your alternative like Google Data Studio.


Continue Reading Below

To construct your individual automated system, you’d deploy right into a cloud infrastructure like Amazon Web Services (AWS) or Google Cloud Platform (GCP) what is known as ETL i.e., extract, rework and cargo.

To clarify:

  • Extract – Daily calling of your SEO device APIs.
  • Transform – The cleansing and evaluation of your knowledge utilizing ML as described above.
  • Load – Depositing the completed lead to your knowledge warehouse.

Thus your knowledge assortment, evaluation, and visualization are automated in a single place.


Competitor analysis and evaluation in SEO is tough as a result of there are such a lot of rating elements to management for.

Spreadsheet instruments will not be up to it, due to the quantities of information concerned (not to mention the statistical capabilities that knowledge science languages like Python supply).

When conducting SEO competitor evaluation utilizing machine studying, it’s essential to perceive that this can be a regression drawback, the goal variable is Google rank, and that the hypotheses are the rating elements.

Using ML in your rivals can let you know what the important thing drivers are, establish profitable benchmarks amongst them, and inform simply how a lot raise in rank your optimizations can probably ship.


Continue Reading Below

The evaluation is a snapshot solely, so to keep on high of the rivals, automate this course of utilizing Extract, Transform, Load (ETL).

More Resources:

Image Credits

All screenshots taken by creator, June 2021

if( !ss_u ){


if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
fbq(‘dataProcessingOptions’, []);

fbq(‘init’, ‘1321385257908563’);

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘machine-learning-seo-competitor-research’,
content_category: ‘marketing-analytics seo ‘

}// end of scroll user

  • Share