Talk-Talk bot following customers browsing

After reading a couple of articles on the internet about Talk-Talk's spider following their customers around the net, I decided to try an experiment to see if for myself.

First of though, the background. Back in 2010 Talk-Talk introduced an anti-malware system within their network in conjunction with Huawei. The basis of this was that the platform would block any potential dodgy sites to any of their subscribers if malware was detected on the site. In theory this looks like a great idea, but in reality lets look into how it gets a database of sites.

I create a brand new site; http://testsite.richardallen.co.uk at 12:38 on 9th July 2013. I then posted this to facebook and asked a friend who has talk-talk to click on the link, all the time I was tailing the access log.

The frist hit was as expected, Facebooks user-agent for their spider for any links posted to Facebook;

173.252.100.115 - - [09/Jul/2013:12:38:20 -0400] "GET / HTTP/1.1" 206 5650 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"

The second hit was my friend, a Talk-Talk customer;

92.28.223.118 - - [09/Jul/2013:14:27:31 -0400] "GET /assets/js/jquery.js HTTP/1.1" 200 247823 "http://testsite.richardallen.co.uk/" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mobile/10B350 [FBAN/FBIOS;FBAV/6.2;FBBV/228172;FBDV/iPhone5,2;FBMD/iPhone;FBSN/iPhone OS;FBSV/6.1.4;FBSS/2; FBCR/EE;FBID/phone;FBLC/en_US;FBOP/1]"

 

IP Address 92.28.223.118
Host host-92-28-223-118.as13285.net
Location GB GB, United Kingdom
City Plymouth, K4 -
Organization TalkTalk
ISP TalkTalk
AS Number AS13285 TalkTalk Communications Limited
Latitude 50°39'64" North
Longitude 4°13'86" West
Distance 2251.93 km (1399.28 miles)
IP Address 92.28.223.118

Within 30 seconds of them accessing my site - and only them as no-one else knew this site existed, and no one else had attempted to access it, I had two more hits;

62.24.222.131 - - [09/Jul/2013:14:28:01 -0400] "GET /robots.txt HTTP/1.0" 500 688 "http://testsite.richardallen.co.uk/robots.txt" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)"

and

62.24.222.132 - - [09/Jul/2013:14:28:03 -0400] "GET /robots.txt HTTP/1.0" 500 688 "http://testsite.richardallen.co.uk/robots.txt" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)"

I won't paste all of my log, but both of these IPs requested GETs for every page under this subdomain and both starting with /robots.txt.

To start, we can tell these are bots as their first attempt is to GET the robots.txt file. If we then look up those IP addresses; 62.24.222.131 & 62.24.222.132, we can see that they also fall into the Talk-Talk IP range - a quick google of these also throws up loads of discussions around this.

I then went back to my friend to ask if they had Talk-Talk's homesafe or Malware Protection  turned on, to which they confirmed no to both of.

With this in mind, I started to look around on the web and discovered that Talk-Talk where actually referred to the ICO back in 2010 for their malware trial. Within that document is a brief swimlane of how their Anti-Malware system works;

Screen shot 2013-07-12 at 15.36.36

 

 

Show Comments