Stop spam in Google Analytics

Home » Blog » Resources » Stop spam in Google Analytics

Stop spam in Google Analytics

Google Analytics can give a false impression of visitor traffic.  Did you know that the majority of traffic may be spam and not real visitors?  What may seem to be impressive visitor numbers on first look may actually be a pitiful number of real visitors after the garbage is stripped away.

Google Analytics data must be stripped of useless data that skews real traffic in order to provide meaningful results.  At the base are four steps to provide more meaningful data.

  • Four standard views for data tracking
  • Filter out internal users
  • Filter out ghost spam
  • Filter out referral spam

The importance of views

Any new Google Analytics property I create has four base views; All Website Data, Unfiltered View, Test View and Master View.  The choice of names is up to the administrator however the basic reasoning behind the creation of the four base views should be incorporated.

A VIEW starts collecting data at the time it is created but not when the Google Analytics account was originally created.

Filters irrevocably change the data in a view and once applied the data they affect cannot be changed back.

The default view is ALL WEBSITE DATA.  I create a second identical UNFILTERED VIEW as a backup.  These two views show all the raw data including visits by admins, bots, spiders, spam referrals and so on.

TEST VIEW is for seeing how a filter changes the data before being applied to the MASTER VIEW.  TEST VIEW is a practice ground of sorts.

The MASTER VIEW is site traffic after the filters have been applied to provide meaningful data.  The MASTER VIEW is the working Analytics data for your use in making business decisions.

Creating views

Log into Google Analytics.  Click on the ADMIN link at the top of the page.  Select the ACCOUNT from the first column then select the PROPERTY from the middle column.  In the VIEW column the default view a new property will show as ALL WEB SITE DATA.  Click on the VIEW dropdown and click on CREATE NEW VIEW.

image001

Type a name for the REPORTING VIEW NAME field and select the applicable geographic locale and time zone where the website mainly operates for visitors.  (Hint:  your customers)

image002

Once the view has been created then make sure it is selected in the drop down and click VIEW SETTINGS.

image003

 

In VIEW SETTINGS the view can be renamed or trashed (via the move to trash can button at the top right).  The time zone, country and currency can be adjusted.

It’s a good practice to enable exclude all hits from known bots and spiders under Bot Filtering when the view is ultimately being configured to work with human visitors.

If your site visitors use the search function a lot then turning site search tracking ON may be preferable.

Again this is what the TEST VIEW is for before applying data altering settings or filters.

A good starting point is to set up Google Analytics with these four views in place.  You may choose to add more views.

 

Where to create filters?

Now that the base views are in place filters can be created in two areas; at account level and at view level.

  • A filter created at the account level can be applied to any view of any property within the account.
  • A filter created at view level can only be applied to that specific view.

As an example if you administer several properties within the account then adding a filter to remove your own IP from the data would be added to the account level so that it may be applied to all properties and their respective views rather than creating a “remove administrator” filter for each individual property and view.

 

Filter Out internal users

Internal users are visitors that own or work on a website.  It is important to filter out known owners, administrators and employees whose visits should not skew the data.

Log into Google Analytics.

image004

Click ALL FILTERS and then click the ADD FILTER button

image005

image006

  • Type a name for the filter such as “Remove administrators”.
  • Ensure the EXCLUDE radio button is selected.
  • Click CUSTOM then click the filter field dropdown menu and type “IP” in the search window.  Select IP ADDRESS.
  • Type in the IP address to be filtered it can be typed as 123.123.123.123 or as a regular expression 123\.123\.123\.123.
  • Separate multiple IP addresses with a pipe | but do not begin or end the regular expression with a pipe |
  • Click on the views to apply the filter.  CTRL-CLICK (PC) or OPTION-CLICK (MAC) to select multiple views and add them.
  • Click SAVE.

image007

If you are new to Google Analytics then you may want to apply the filters to the TEST VIEW only and then wait for a couple of days to see if the results are as expected before applying the filter to other views.

 

Filter out ghost spam

Ghost spam is where a spam URL appears in Google Analytics without the spammer ever have visited.  These fake visits or ghost spam will affect Google Analytics by skewing the data;

  • Increases overall bounce rate
  • Decreases overall time on website
  • Inaccurately increases visitor traffic
  • Affects referrals, goals and segments

Do not click on ghost spam links in Google Analytics as they are also dangerous due to potential malware or virus infections.

Creating a VALID HOSTNAME filter will filter out ghost spam.  Valid hostnames are websites, including your own, that use your Google Analytics tracking ID or code.  If you have added the Google Analytics tracking ID to your YouTube account then YouTube is a valid hostname along with your own website.  If you have not added the Google Analytics tracking ID event to what appears to be a valid website then it is not a valid hostname.

In Google Analytics set a reporting range which is typically from tracking inception or, at least, a year.  Click AUDIENCE>TECHNOLOGY>NETWORK and then look on the results page below the graph and click HOSTNAME (beside Primary Dimension and Service provider)

image008

A list of hostnames will appear and then the valid hostnames can easily be identified.  I only used the Google Analytics tracking ID in my website and my YouTube account so only those two are the valid hostnames.  Spammers try to trick you by spoofing real names such as google.com.

Create a regular expression string of the valid hostnames.

www\.ridgemoormedia\.com|www\.youtube\.com

Create a new filter.  Since this filter may be applied to several views of a specific property then it is better to create it at account level then apply it to the specific views.

image009

Create a new custom filter by clicking CUSTOM then click the INCLUDE radio button. Click on the FILTER FIELD dropdown and select HOSTNAME.  Next enter the regular expression string for the hostnames in the FILTER PATTERN field.  Remember to name the filter.  I used VALID HOSTNAMES as a name.  Click on the view and add it to the list which the filter will be applied to.

image010

Finally save the filter.

 

Before applying ghost spam filter

image011

After applying ghost spam filter

image012

 

Filter out Referrer Spam

Referrer Spam, which is also known as referral spam, are repeated web site requests (visits) using a fake URL.  These unwanted visits negatively affect Google Analytics in much the same way Ghost Spam does.

Creating and maintaining referral spam lists is a vigilant task for anyone maintaining Google Analytics.  Referrer Spam sources have to be constantly added to the lists and Google Analytics only allows a maximum of 255 characters per list filter string which means many filter lists have to be added.

The first step is to identify referrer spam.

  • Log into the Google Analytics account
  • Set a date range.  The larger the better.  A minimum of 1 year but original account inception is better.
  • Select All Website Data
  • Navigate to ACQUISITION then click ALL TRAFFIC and finally REFERRALS.

 

image013

Select SHOW ROWS at the bottom of the page to display a larger number of rows.  500 is a recommended but less or more can be selected.

image014

Scroll to the top of the page and click on EXPORT and choose a file format.  Excel (XLSX) will be used in this tutorial.

image015

Click on Excel (XLSX) and an excel file with three tabs will be created.  Click on the DATASET1 tab. Select all of the contents in the Source column and copy. (Right-click copy or CTRL-C)

image016

Paste the data into a new excel spreadsheet and save it using a suitable file name on the local PC.  In this tutorial Referral Spam.xlsx was used.

Examine the list and delete cells that are legitimate referral sources such as google or other valid websites.  The majority of spam referral websites are fairly straight forward to identify.

Sort the new list A to Z which will rearrange the list pushing the empty deleted cells to the bottom.

image017

Insert a header row in column A with an appropriate title such as Referrer. Type Removal Code as the header to column B.

image018

Write the expressions in the corresponding cell in column B.

Expressions

Google Analytics can use regular expressions in filters.  These are most commonly used expressions in this tutorial.

\ An escape which forces the expression to match the following character as literal instead of a function.  As an example 1\+1 indicates the + sign is a literal character instead of an addition function.
\d\ Any single number 0-9 between the escapes
(.*?) Will match any character(s) between two escapes or from the start until the first escape
| Separates two expressions.  Do not place the | symbol at the start or very end of an expression string.

 

Combining referral domains with similar patterns

One expressions can filter referral domains that have similar patterns to reduce the overall number of expressions.  The differences in the following example referral URLs are the numbers following the www

www1.free-social-buttons.com
www10.free-social-buttons.com
www2.free-social-buttons.com
www3.free-social-buttons.com
www4.free-social-buttons.com
www5.free-social-buttons.com
www6.free-social-buttons.com

 

The \d\ expression

www\d\ .free-social-buttons.com

This would match any single digit pattern such as www2.free-social-buttons.com but not any two digit combination like www10.free-social-buttons.com.

The (.*?) expression

(.*?)\.free-social-buttons.com

 

This would match any character(s) from the start until the first escape backslash or between two escape backslashes.  The seven example free-social-buttons.com referral domains can now be summarized in one regular expression.

To complete the expression add the escapes.  The most common rule of thumb is to add an escape before any dot or hyphen

(.*?)\.free\-social\-buttons\.com

 

Checking the expression

RegExr (http://regexr.com/) is one of many websites that can be used to check and reference expressions.  Copy and paste the finished expression into the expression field between the / and /g.  Copy and paste the referral domain into the text area and if the expression is written properly then the match button will indicate the number of correct matches.

image019

Common expressions

Construct Description
Character Classes
[abc]
a, b or c
[^abc] Any character except a, b or c
Predefined Character Classes
. Any character
\d A digit: [0-9]
\D A non-digit: [^0-9]
POSIX Character Classes
\p{Lower} A lowercase alphabetic character
\p{Upper} An uppercase alphabetic character
\p{ASCII} Any ASCII character: [\x00-\x7F]
\p{Digit} A digit: [0-9]
\p{Alnum} An alphanumeric character
Quantifiers
X? X, zero or one time
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times

 

Boundary Markers
^ Indicates the subsequent characters must appear at the beginning of the string
$ Indicates the preceding characters must appear at the end of the string
Literal Expressions
\ Escapes (quotes) the following character. Necessary if you want to match to a period (‘.’), bracket (‘[]’), brace (‘{}’) or other special character.
\Q Starts an escaped (quoted) literal string.  Literal string should be closed with \E.
\E Ends an escaped (quoted) literal string that was started by \Q.
Other
(?i) Turn on flag to ignore case.
(X) Match string X.
(?i:X) Match string X, ignoring case.
(?!X) Do not match string X.

 

The vertical bar or “pipe” character in regular expressions

A very valuable character is the vertical bar or “pipe” character

|

 

Adding it to the end of every expression in the excel spreadsheet will be a timesaver.  The | character can mean “or” in an expression.  It is used to separate combined expressions into an expression1 OR expression1 OR expression3 OR etc.

expression1|expression1|expression3

 

For this tutorial add the | character without a space to the end of each expression in column B.

https urls

/^(https?:\/\/)?(expression go here)

 

http urls

/^(http?:\/\/)?(expression go here)

 

Write the expressions

Similar referral domains will become obvious once the list is sorted alphabetically A to Z.  Identify those that differ only by a number(s) or a specific repeating pattern of characters such as the previous free-social-buttons.com example.  Replace the variable in the domain URL with the greatest number of numbers or characters with a placeholder series of # signs.

Example 1

www1.free-social-buttons.com
www10.free-social-buttons.com
www2.free-social-buttons.com
www3.free-social-buttons.com
www4.free-social-buttons.com
www5.free-social-buttons.com
www6.free-social-buttons.com

 

will become

www##.free-social-buttons.com

 

Example 2

site17786171.snip.to
site29582971.snip.to
site40127129.snip.to

 

will become

############.snip.to

 

Delete the others because a single expression will be written to cover all variations.

Re-Sort the column A to Z to push the empty cells to the bottom.

image020

Tip:  Use one of many sites such as RegExr (http://regexr.com/) to check and reference expressions.

 

Merging the list

The column with the finished expressions can be merged into one string

  1. Select an empty cell where the string is to be inserted
  2. Type =TRANSPOSE(FIRST:LAST) in the formula bar (not the cell) but change the FIRST to be the cell column and number of the very first expression in the list and LAST to be the last cell and number in the list. Example: =TRANSPOSE(B2:B100)
  3. Without clicking away from the formula bar Press F9 (PC) or command+’=’ (Mac)
  4. Without clicking away from the formula bar replace the curly { and } brackets at the start and end of the formula bar with regular parenthesis curved brackets ( and )

Example

{“4webmasters\.org|”,”100dollars\-seo\.com|”, etc.

becomes

(“4webmasters\.org|”,”100dollars\-seo\.com|”, etc.

  1. Replace TRANSPOSE with CONCATENATE keeping the = sign in front

Example  =CONCATENATE(“4webmasters\.org|”,”100dollars\-seo\.com|”, etc.

  1. Press enter

The final result will be the full range of expressions in the removal code column (column B in this example) merged into one string.

image021

The reason for including the | pipe character now becomes apparent as each expression in the string is separated by the | character.

 

Creating the Referral Spam filters

Google Analytics limits filters to 255 characters including spaces and the merged expression string may be much greater than 255 characters.

The following method, although a bit clunky, will divide the entire string into separate expression filter code strings that are under 255 characters each.  There may be a VBA code that can divide truncate a set of characters to the nearest marker character on count but it is beyond this author’s skillset.

It is important that each filter code string does not begin or end with the pipe | character.

Copy the merged expression string from the excel spreadsheet to a blank word document.

image022

Select the amount of text to be trimmed to the nearest | character that, when deleted, will leave an estimate of 255 or less characters remaining.  Do NOT trim an expression.

Copy the selection for future use.

Delete the selection

Remove the pipe | character that may be at the very beginning or very end of the string

image024

Click on the word count link at the bottom of the document.  In the example it is the 2 of 2 words text just right of the page number indicator at the bottom right of the document.

image025

Make sure the Character (with spaces) count is less than 255.

Lastly you can copy the string back to another document and organize it for future use.

image026

To create and apply the Referral Spam filter log into Google Analytics and navigate to the preferred account.

  • A filter created at the account level can be applied to any view of any property within the account.
  • A filter created at view level can only be applied to that specific view.

Referral Spam mucks up most everything so filters can be created at account level for use in all properties.  Click ALL FILTERS under ACCOUNT and click the red ADD FILTER button.

image027

Type in a filter name (such as Referral Spam Filter 01) that corresponds to the expression spring lists created and safely stored in another document for future use.

image028

Click CUSTOM under filter type.

image029

Make sure EXCLUDE is selected then select CAMPAIGN SOURCE from the filter field pull-down menu.

image030

Copy and paste the first Referral Spam filter string into the filter pattern field. Select the view from the AVAILABLE VIEWS LIST, add it and then save the filter.

image031

Data will be filtered from the save point moving forward on the view it was applied to. Filters do not change historical data.

 

Hurray! This sucks

The drawback to filtering out internal users, ghost spam and referral spam is that a site that “looks” like it has good traffic.

image032

Can be hugely disappointing when the true human visitor data is left.

image033

But that’s what properly configuring Google Analytics will reveal…where the elbow grease needs to be applied.

 

In summary

Google Analytics data must be stripped of useless data that skews real traffic in order to provide meaningful results.  At the base are four steps to provide more meaningful data.

  • Four standard views for data tracking
  • Filter out internal users
  • Filter out ghost spam
  • Filter out referral spam

Fighting Referral Spam is an ongoing battle.  Revise lists as often as required.  Start by revisiting monthly and go from there.

By | 2017-02-08T09:39:37+00:00 March 2nd, 2016|Resources|0 Comments

Share This Story, Choose Your Platform!

About the Author:

Doug Kronlund
Doug Kronlund is a marketing and management professional with an extensive track record of strong leadership and project management skills leading multiple media and internet projects with overlapping timelines across separately managed accounts. Multidiscipline skill set includes writing for video and online content, directing, producing, editing, WordPress, communications, sales, creative concept development and execution and budget management within freelance, small and large business environments and corporate settings.

Leave A Comment