Thanks again to everyone who came out to our Developers meetup on Thursday night. Despite the absence of Ben and Andy for this one, we managed to cover a couple of interesting topics that I hope you found helpful.
Google Analytics and filtering out Spammy Results.
When reading reports in Google Analytics you’ll probably notice that you’re getting a lot of traffic from websites like:
floating-share-buttons.com
get-free-social-traffic.com
buttons-for-website.com
darodar.com
chinese-amezon.com
and a whole lot more that you’ve never heard of.
Check out this screenshot of some of the traffic that is being reported on a website of a local commercial design firm
At quick glance you might say, 1091 sessions isn’t that bad for local commercial design firm, unfortunately all of this referral traffic (including the 633 sessions from floating-share-buttons.com) are spoofed results created by robots who are simply submitting information packets to google’s servers using your analytics ID.
Why the hell would they do that?!
I’ve found that there are different motivations for each spammer. Some of them do it to generate leads, some to drive affiliate traffic, and others to earn ad revenue by increasing traffic. Not clear on how they generate traffic? Well if you are looking at your report and see that floating-share-buttons.com is forwarding such a large amount of traffic to your site, do you think you might be motivated to visit the site and see who they are and how your business is being represented on their site? Of course you would (or at least you would have before this awesome blog post).
Ok so again, most of these are not real visits; they are simply using your analytics code to submit packages to google. A good tell-tale sign of this is incomplete or (not set) fields that show up in your report. Here’s an example.
If you set the secondary dimension on your referral report to display the Hostname
You’ll see that for most of the referrers the Hostname is not set.
This is actually good news for us, because we can use these shortcuts to filter out the junk and get our reports showing us realistic results. So let’s do that and get rid of these things.
There are two parts to this: First we will build in filters to prevent these results from showing up in our future reports, however they do not work retroactively so I’ll need to show you a strategy to allow you to strip the junk out of any reports you look at prior to us making this change.
Step 1: Let’s build a couple of filters.
In Google Analytics go to your admin section and select filters.
Click on the button and enter the following information
Filter Name: Exclude common spam
Filter Type: Custom
In the Exclude section
Filter Field: Campaign source
Filter Pattern: (Copy and paste the following) **Last updated 8/22/15**
darodar\.|semalt\.|buttons-for.*?website|blackhatworth|ilovevitaly|prodvigator|cenokos\.|ranksonic\.|adcash\.|share.?buttons\.|social.?buttons\.|hulfingtonpost\.|free.*traffic|buy-cheap-online|-seo|seo-|videos-for
Next we can see how this would effect our current results by clicking the “Verify this filter” link under the Filter Verification section
You should see something like this, showing you that if the filter was running today, it would have eliminated these referrers from the report.
That’s it for this one. Click save and check back here periodically for updates to the filter.
Next we’ll build a filter that only allows results that include our Hostname. This will eliminate a majority of the ghost referrals.
Once again create a new filter and fill it out as follows:
Filter Name: Only include hostname
Filter Type: Custom
*Go down and choose the “Include” radio button
Filter Field: Hostname
Filter Pattern: Insert your website here for this example it will be: www\.zdesigninc\.com|zdesigninc\.com
**Notice that these filters must be written as regular expressions, which means that you have to escape special characters like the ‘.’ to do this you simply place a backslash in front of it. In this example I’m including two versions of the domain. I can add as many as I’d like as long as I separate them with a pipe character (|).
Once again verify the filter and you should see something like this:
WHOA!!! That’s more like it. Look at all that garbage that will no longer be skewing our results. Make sure you save the filter and from this point on, your analytics reports will be much cleaner and you will have actionable data that provides real insights into your business’ web traffic.
Step 2: Advanced filtering on existing reports
Remember the new filters won’t effect our existing and historical reports, they will only be in effect on all reports going forward. Does this mean that you are out of luck when looking at past reports? Of course not! You’ll only need to apply some real time advanced filters that are similar to those that we just created. Let’s do that.
First make sure that you add a secondary dimension of Hostname (just like we did earlier) on the report you want to view
Next click on the advanced link on the upper right header (just below the graph).
Now we can add some filters. Let’s start with Hostname since it will have the greatest impact.
Make sure that include is selected and in the Add Dimension box type Hostname.
Select “Containing” from the dropdown list (it should be the default) and then type your domain name into the box. (This should be in a normal format… do not escape any characters and only add 1 domain per query.
If you hit apply here you’ll see that a large number of bad results have now been filtered out of the report, but you’ll likely have some stuff that still needs to be removed. Unfortunately I am unaware of an easy way to do this without doing them individually. (If you know a better way, please share)
In my example I still have referrers like: success-seo.com, buttons-for-website.com, etc that I don’t want to see. Here’s how to get rid of them. I’ll demonstrate success-seo which seems to be the biggest culprit at the moment
Go back up and click on the “edit” link that has replaced the “advanced” link in the top right header just beneath the graph. Your query should appear again, ready for more conditions.
Now click the “Add a dimension or metric” button. Then in the search box type “Source” and click the dimension to add it to the query.
In the text box next to containing start typing the name of the source you want to remove and when it pops up in the results select it to add it to the query. In my example I started typing ” success” and success-seo.com came up so I added it. Next click EXCLUDE from the drop down on the left side.
Now if you click apply you should see the result has been filtered out.
VOILA!!!!! Simply do this for each of the results that you don’t want to see and you’ll have a report that more accurately represents the traffic you’re getting and where it is actually coming from. The good news is that with the filters we set earlier, you won’t have to do this on your reports going forward.
Again I hope this was helpful. We also discussed different ways of adding the analytics script to your wordpress site, but this post was a little lengthy so I’ll summarize what we discussed on that topic in another post. As always if you have any questions or feedback, please leave them in the comments or reach out to me directly. You can find my contact info at about.me/ronbrennan I look forward to hearing your thoughts.