Residential Proxies and Web Scraping: Future of the Industry

Episode #005

26 min listen

Episode #005

26 min listen
“E-commerce is the biggest web scraping use case and you'll find in some public court documents that Amazon, Target, Walmart, etc. they're actively paying proxy companies to help scrape Facebook. So these large companies are doing the scraping themselves and for them to set a precedent in court to say it's illegal to scrape publicly available data means they'd be hurting themselves”
Neil Emeigh, LinkedIn
Description

The topic of this episode of Ethical Data, Explained is proxies, web scraping at scale, and data collection of publicly available data. Henry Ng is joined by Neil Emeigh to discuss Neil's journey from an SEO specialist to the CEO of an ethical proxy company, trends and the future of the proxy and web scraping industry, resonant data collection cases that make scraping look bad, and why it is important to be vocal about the positive applications of web scraping.

Learn more from Neil Emeigh on maintaining ethical data standards as a web scraping company.

Transcript

Henry - 00:00:00: Welcome to ethical data explained. Join us as we discuss data-related obstacles and opportunities with entrepreneurs, cyber security specialists, lawmakers, and even hackers to get a better understanding of how to handle data ethically and legally. Here to keep you informed in this data-saturated world is your host Henry Ng.

Henry - 00:00:21 Hello everyone and welcome back to Ethical Data, Explained I'm your host Henry Ng and today we have a special guest Neil Emeigh - not only a founder and CEO at Rayobyte but once biked from Nebraska to San Francisco in 29 days so an impressive man outside of the world of data as well. I actually listened to his talk back in 2022 at the Zyte Summit, I was there at that event and we have him here today to not only talk about his experience and his career journey and his experience at Rayobyte but his experience in the world of data as well. So yeah welcome Neil. I'll hand over and you can do a little introduction for yourself.

Neil - 00:01:00 Thank you Henry, thanks for having me. Yeah, Zyte you just sparked, that was a fun conference, yeah, it was good. For those who haven't seen it, it's a great talk, I think very informative and engaging. But I talked about how to source IPs and that's kind of my, the last seven years of my life have been about - how to source and sell proxies in an ethical way, of course, and that's what's got me here today. And before that I was just a normal college guy, getting my computer science degree and the business took off and here I am.

Henry - 00:01:36 Brilliant, so what kind of spark that, what was your degree if you don't mind me asking, and what kind of sparked your kind of route into Rayobyte and like IPs and everything like that.

Neil - 00:01:50 Yeah it was a computer science degree I'd always been kind of a techy and programming, I think I started when I made my first kind of like my video game bot, I botted a game that helped me like do some actions that were tedious when I was like 14, something like kind of macro, basically. So programming went back quite a ways but not too intensively and so I did that in college. And I'd always been doing SEO. I started SEO when I was maybe 16 in high school and did like little odd-in SEO jobs, and back then that was a day when you could kind of spam some backlinks online and immediately get ranked on Google number one overnight, and have some some chief filler sites that I mean like so I would go create a site they would just have some thin content, like it was informative, it wasn't doing anything illegal or unethical in my opinion. It's just that Google wanted to see, to rank and it was very easy to rank back then. Times have changed. So that was when I was 16 and so I kept doing SEO SEO SEO and then through college, I was still doing a lot of SEO services, and when I was doing those services I was using a lot of proxies and a lot of servers. And I kept running across the same problems - poor uptime, poor customer service, shady sketchy provider. This is 2015, when I started the company that is today, I was in college in my last semester of college and I was like you know what let me try it myself and within one semester, three months, I had a hundred thousand dollars of sales from selling it to people in my SEO communities. That was my kind of in so that's what started the journey. 

Henry - 00:03:24 Brilliant and would you say that's still what you're kind of passionate about and what Rayobyte is passionate about is that focus on SEO or have you kind of expanded beyond that kind of passion of just SEO?

Neil - 00:03:35 Yeah yeah up until about a year ago we rebranded actually from our former name. For the prior six years we were called Blazing SEO which is a horribly branded name by the way. It is a rebranding effort that isn't something to do overnight so we had to find the time to prioritize it and so uh Blazing SEO is because I sold SEO to people in the early days and we quickly got out of the SEO after maybe the first six months. We said, wow look at all these other use cases for proxies, and so Rayobyte was just rebranded a year ago and yeah we're much more general.

Henry - 00:04:06 Brilliant. And obviously like you said you found it back in 2016, you kind of learned from your mistakes and you made your changes, and, I mean, up until this point like what are some of the things you've kind of learned about building ethical data usage practices? Like have you made your own mistakes or have you just kind of learned from from market mistakes?

Neil - 00:04:27 I talked about it as well too just we've had some interesting cases, you learned when the cases come up of like oh there's someone who use our IPS to do some abuse that we catch and then we're like how did that slip through the cracks?  How did that slip through our vetting process, our KYC processes? And over now a seven year period you have so many cases that you can start adjusting to make sure our network's being used for the most legitimate cases possible and, yeah, no, just over time you can kind of learn the tricks of the trade. 

Henry - 00:05:00 Of course, of course and where do you kind of see the future of residential proxies moving to? Obviously we've covered SEO, you said obviously there's other practices that are on the market now, things like social media account management and things like that do you see the proxy market kind of developing from that or going any further than that?

Neil - 00:05:23 No, I think scraping will continue to be the greatest driverю As we all know the fad right now of ChatGPT there is a one giant scraper or you could argue maybe they have access to the databases built by Reddit, Wikipedia, maybe they got access to systems, but that's where AI will be able to evolve to where they're just scraping all the world's publicly available data, to do that at the scale. AI is just that one example alone, you need a lot of residential proxies to do so, to collect that publicly available data, so I think scraping will be the major, but it's applications will continue to vary as we're seeing the growth in AI in particular. 

Henry - 00:06:01 Okay and from your past experience, over the past, you know, seven eight years, coming up to eight years, what would you say is the most creative way you've seen proxies being used by clients? Anything that really stands out to you? 

Neil - 00:06:16 Well, it's for those who've been in the industry it's not a surprise but for those who haven't, then um the fun industry is much harder seven years ago was uh the sneaker industry. People using proxies to cop, as the terminology, cop the latest sneakers - Yeezys were the ones that people are going for the most, they're the hottest and people could buy the sneaker for retail for 300 and then resell it for two thousand dollars. Their profit margin was just massive so they would just buy a lot of IPs from us to try getting a chance just to buy one shoe because they would pay for the profit margin to be able to resell it, so that was fun. There were some, there was one client we worked with, this is a fun story, it was our largest sneaker client he was basically like a middleman. There's a whole supply chain so there's the consumers who just had their own little bots but then there was kind of middlemen people could pay like 500 for the 300 shoe and those middlemen would try figuring out and those middlemen would come pay this client of ours and that's one client of ours he, on one release he had a very customized bot that he built himself. I asked him how he did on the release and he said: “This is one of my best ever, in that one hour period I made three million dollars”. I'm in the wrong business here and he did that weekend after weekend after because Adidas it would have drops almost every weekend whether the ultra boost or Yeezys or otherwise 

Henry - 00:07:36 That's definitely the crazy thing though even before I started in proxies I was in SaaS, so I had no idea about this market. And then as soon as I joined doing SOAX and became part of the team I didn't realize how big the sneaker market actually is. Obviously you see the upsell on sneakers all the time but the the inventiveness and even some of the people doing the actual sneaker copping they're not that old, they're like between the ages of 16 and 21. Yeah, crazy what are they doing 

Neil - 00:08:04 Exactly, yeah yeah fascinating yeah definitely.

Henry - 00:08:09 So one of the big things of Rayobyte and, obviously, your past is like being that ethical standard of residential proxies, the ethical acquisition of residential proxies. So in your opinion you've kind of stated that you are all that standard, do you see any kind of other ethical proxy providers on the market and, in your opinion, what essentially does it mean to be like an ethical proxy provider?

Neil - 00:08:32 Yeah yeah, well the first part of that my answer would be, so we just started selling residential a little over a year ago and we've been around for over seven years selling datacenter proxies. We knew for quite a long time like residential, there's money there and people could ask us for it but what we saw back then was people asking for sketchy use cases, unethical or illegal use cases, and we kept asking ourselves how could you possibly acquire so many IPs ethically? And we kind of naively didn't pursue it because our datacenter were the bread and butter of datacenter. Proxyway's report in 2022 said our results were the best in datacenters. We know datacenters, it's kind of who we are and we stayed in that lane. Residential continues to grow in valley and we took a step back about two years ago when we started development of residential and said you know, how can we get in here ethically? We won't enter into it if we have to go the route that so many providers do. There's been some that have been shut down recently by the FBI that acquire IPs by basically a botnet malware on people's computers and we just kind of assume like, oh that must be how companies are doing it or some of the active providers in the market right now, who claim on their front page that they're the most ethical provider yet there's research paper after research paper after research paper that says they can't be. It's these these IPs are coming from like IoT devices which tells you that like no one intentionally installed it on IoT so the point being is we step back and ask how can we enter this ethically? And you're very at the point we say that you know we're ethical I'd be remiss not to acknowledge folks and this this podcast is yes. 

Henry - 00:10:08 This is not a plug, we're not paying him to say this this.

Neil - 00:10:11 I think the key point would be everyone says they're ethical, us included, SOAX included and you know top competitors included. The question is - are they really? And I I can't answer, you know, whether SOAX is or not. I know we are, I feel confident we are, for the things that we listed on our site and the way I encourage some of the customers who ask us and question that. From a customer perspective we kind of say - well, sure, try signing up and seeing if you could do something unethical with their IPs, we'll catch you. Like we just are so confident, we challenge that and then on the acquisition side, our acquiring app is called Cash Raven, any developer can come to Cache Raven and say hey I have my app I want to monetize it and they'll quickly find that we, that their app we make sure there's a pop-up that requires the terms of service the customer knows their IP is being used for proxy et cetera et cetera and so our boxes are checked if someone wanted to do the due diligence. Where I can't say the same for some other competitors of course out there.

Henry - 00:11:10 And then like from from all sides as SOAX and dual site as a provider as well, obviously we have those kind of liabilities in place for if we get notified, before we get flagged one of our all users is misusing any of the proxies and that's how we tackle it. And I'm sure that Rayobyte have the same type of process - as soon as any illegal activity is there we flag it and we kind of cease the service right away. I think one of the big things about not just proxies but data collection, scraping whole is that legality side of things and it's the big question, and ever since I've joined SOAX and worked in the world of proxies I've called it kind of that gray area technology. From your opinion, especially from like last year's 911.re proxy provider scandal and the HiQ versus LinkedIn case, do you feel web scraping will ever get rid of that kind of bad reputation or do you feel like it's kind of too far gone from that point now? 

Neil - 00:12:10 From a reputation standpoint, that's a good question, I think that would depend on who you ask. I think scraping, well, it's yeah, my answer comes to scraping publicly available information particularly social media with the Cambridge Analytica case with Facebook that's put scraping on the hot seat. Proxies are even kind of further down where proxies are kind of have a bad a pretty hard bad reputation because of them being used for all kinds of illegal and unethical cases. That's a harder hill to fight honestly, in my opinion and we do what we can as an industry as a whole. Scraping I think is a lot easier but then you still have these cases like Cambridge Analytica that put oh dang it that's bad and so from the ethicality inside on a more legal so the reputation side yeah scraping is a little easier, I think, to get there and really get a bright light on it. Proxies are kind of like behind scraping. I think as both of them get pushed as companies like ourselves really put this in the forefront, a podcast like this that's in the forefront, I think it makes the stronger case than prior, to just a couple years ago there weren't podcasts like this. There weren't people talking about this as often so that's good for all of us in the industry, this is happening on the legal side for scraping you see the HiQ case, you see the 911.re getting shut down. What I see in in the future coming forward is social media, it is going to get harder and harder just from based on government pressure. I don't think the social media company, well we've been around long enough, the social media companies didn't care that much for that just like more of a tolerable amount, like they don't spam us and we'll look the other way kind of thing. But now the governments are pressuring on them because it's personal data, whereas I don't think there will be much that will evolve from the legal side is especially on e-commerce or anything that's non-personal. But e-commerce is in the biggest use case because you'll find in some public court documents that like Amazon, Target, Walmart they're actively paying proxy companies to help scrape Facebook. Actually in the Facebook, Meta and Bright Data's lawsuit, Meta was paying Bright Data to scrape, so these large companies are doing the scraping themselves. Amazon, Target, Walmart Etс. So for them to set a precedent in court to say it's illegal to scrape publicly available data means they'd be hurting themselves, so it's just kind of as you said this gray area like everyone's going to like tolerate it and put an anti-scraping technology to so that their competitors can't get their data and that's where it gets really interesting. 

Henry - 00:14:50 Of course and obviously Rayobyte and yourselves being founding members of the ethical web data collection initiative, I'm guessing that will kind of help towards how that's managed moving forward and what that looks like in future. But it'll be great to know a little bit more about how you know you became part of the initiative and the work that our members actually do um just so our listeners are fully aware. 

Neil - 00:15:12 Yeah, so the EWDCI, it's a quite a mouthful, it was just launched officially here just recently but it's been in the works for about a year now because of what I kind of said just earlier, that prior to probably a year ago if you'd go on like wayback machine and look up like mentions of the word “ethical proxy provider” or “ethical proxies” you see it's so much fewer cases being talked about. This even prior to a year ago, seven years ago when I first started it was a wild wild west, like everyone's just kind of figuring out what our proxies, what are their use cases how can we monetize using proxies and so the EWDCI initiative it's our effort to kind of get ahead of government regulations that latch on to the the issues like Cambridge Analytica. The government officials all they're seeing right now is those bad cases so they're saying okay scraping must be bad let's put legislation around or it's not all bad and so this initiative is going to help showing all the useful cases that power all of our normal lives like looking for a hotel looking for a flight, looking for your product on Amazon.. those prices are the cheapest because they're scraping and finding what are the prices out there, so it's helping our Senators and our congressmen get cheaper prices and so that's where we need to shine the light on the benefits of public scraping. 

Henry - 00:16:38 Yeah and I definitely think this is something that the public don't really understand about where proxies and web scraping comes in it, if you look at the the industry as a whole it seems a very specialist and very niche thing but if you bring up the fact that things like booking.com, hotels.com, they're doing the same thing they're scraping that data in the background and they're contributing to what it looks like for your day-to-day and when you put it like that the whole gray area starts to disappear because yeah. I booked a flight to Singapore for 600 pounds because I use booking.com whereas if I Googled it, it was 800 something pounds. It just makes more sense and yeah kind of that overall view and I think this is a thing with like the trends of what proxies and scraping and the scraping space looks like um and ourselves being both companies, being providers like what do you see as like new opportunities that might be up and coming for the proxy world. In your opinion what should proxy providers really pay attention to to be more competitive in the next couple of years?

Neil - 00:17:37 Well yeah, there's a lot of theories on this and my personal opinion as we're at a period right now where there's a lot of people swarming this industry and it's only so big and it's not a huge industry by most market standards and it's getting saturated and with saturation comes varying degrees of quality and it can be provided into the market, so as I look forward the five-year timeline, 10-year timeline you have to keep asking what's going to change? Well I think I don't think public scraping is ever going to be made illegal. I think personal information, I think in 10 years scraping personal data, that could be illegal but in terms of the other major use of e-commerce, booking, etc I don't think they will, what will happen is the anti-scraping technology that those companies put on their sites will continue to get more and more and more advanced and how the current networks of IPs work today, what we've seen in our competitors, what we know are even our software is we're a bit of a hands-off infrastructure to a degree. You sign up use the IP on your software and you kind of go off and you try getting successful but what's going to happen is as these trillion dollar companies like Amazon get better and better at saying “nope, I want to ban you” and bans are banning fuse at mass, I think companies like ours won't be able to give a kind of free rain access to proxies to the end user because they're likely not gonna to be experts at having good user agents for when they're scraping and good anti-fingerprint technology and rotating their IPs and good intervals and all these things that like make you kind of evade the scraping. A lot of end users don't use that so they'll ban vast amounts of IPs if you give them that access to do so, so are they companies like us will have to stay ahead of that as the anti-scraping technology increases, well maybe sort of tightening or helping that end user as much as possible 

Henry - 00:19:36 Definitely, definitely. That's all of the the scraping side and the proxy side questions that we had kind of lined up but one of the final questions before we move on to all questions we ask every guest is, I'm going to rephrase it because to me you're a young founder, as a young founder what would you give as advice to other young founders looking to kind of break into that proxy industry and the proxy market or scraping market?

Neil - 00:20:07  So we offer reselling of our proxies and so we see ambitious people come to us and say “yes, I have some people I want to resell” and there isn't a high success rate of the people who succeed there because, as I said the market is growing more and more saturated so the big brands that are recognizable and have some authority and have been around long enough, they're the ones who kind of stuck up most of the market and the newcomers, it's so hard to compete against that brand power. So my advice to people entering the proxy space would be one if you have limited resources you need to make sure you have a very close knit niche of customers who want to buy from you, because you're the expert and you can help them. It's more of a service type, like you're helping them succeed with the instead of just a “come by from my website” kind of approach. And that's what I did when I started the company is like SEO. I knew people in the SEO, I approached them, they're like, yeah, I'm not loyal to this company but you know you're going to help me, sure let me work with you and that's how I grew rapidly. So the same still applies today and then too on the other side is it's kind of a common business cliche is find ways to differentiate if you were to go create a site and offer the exact same products, the same same pricing tiers, the same offering as all the major brands you're likely not going to win out and so you have to ask how can I differentiate? What is it service based? Maybe build little tools that help people in their industry, you know, things like that help, just add a little bit extra differentiating that the big companies aren't necessarily worrying about 

Henry - 00:21:39 So any aspiring, you know, proxy aficionados out there listening to the podcast, definitely take that advice on board. Our final three questions we asked every guest and the first one is who in the world of data would you most like to take out to lunch? I won't be offended if you don't see me, that's perfectly fine but in the world of data, if you could pick anyone who would your choice be?

Neil - 00:22:01 I admit how naive I may be but I can't, like when I think of a person in data a name doesn't jump out at me and and it probably tells me more about myself. The data is less interesting to me than the product and the use cases 

Henry - 00:22:20 Maybe a product person then?

Neil - 00:22:23 yeah yeah yeah yeah yeah well then I think my product person would be Jony Ive from Apple, yeah you know I mean that is the greatest. I think it's great to pick his brain for a half hour.

Henry - 00:22:34 It's just the way his mind works and how strategically placed it like he's, it's like precision, it's like watching a surgeon think when he lays out any product plan. So the second question is software, right, we work in the world of proxies but software wise we, I'm sure, we use apps and software on a daily basis what piece of software could you not live without on a daily basis 

Neil - 00:22:55 I feel like you'd shame me if I didn't say chat GPT. The fad and everything but no it's an impressive tool software, I you know. I that one's hard too, I use a lot of software I'm a tech guy, I'm a computer science guy I use we're a Slack company, we're Google meet, Google Suites company, we use a sign up for project management. Beyond that I wouldn't say there's much. I would miss my Garmin watch, my software that tracks my my health and my steps and my runs and so forth. 

Henry - 00:23:31 Fair enough, fair enough. It's important I feel like fitness is something that people are kind of focusing more and more on and tracking is important, so you can see your improvement yeah. So the final thing is the use of data and solving real world problems and we're both very much ingrained in helping people use data on a day-to-day basis but for your personal experience how have you used data to solve a real world problem? It could be work based, it could be personal life based. What's your kind of real world problem that you've always been able to solve with the right amount of data? 

Neil - 00:24:07 yeah I think that brings me back to my SEO days, I was using the tool called Scrapebox which is an old old school Google scraping tool. They load in proxies and it's great, Google search results for you and whether you use whether it is for like lead gen or you'd find thousands of URLs that had certain keywords or you'd do it for keyword tracking as well too. I did that for SEO Services, though did a lot of solving on the SEO side in terms of scraping and proxies back then. 

Henry - 00:24:35 Brilliant so that was the last question. I definitely want to thank you for jumping on the podcast, it was great to get to know you and you're actually the first other proxy provider we've brought on to the podcast, and I think it's great that we share a lot of the the same type of knowledge around ethical data and what we believe in when it comes to ethics. So thank you very much for sharing your view and for having us on, guys if you if you want to know more definitely go follow Neil on LinkedIn I'm sure there's plenty of links there of his talks at the Zyte Summit and and other summits that he's been a part of and hopefully have you back on in future maybe in a couple of years to see how everything's developed and see how we stand as proxy providers and data scraper providers. 

Neil - 00:25:23 Perfect, thanks for having me, Henry and thanks to SOAX for taking the initiative to be ahead of something like this, as important as it is.

Henry - 00:25:30 Brilliant, thank you very much and thank you very much listeners, and hopefully we'll see you next week.

Ethical data explained is brought to you by SOAX, a reputable provider of premium residential and mobile proxies - the gateway to data worldwide at scale. Make sure to search for ethical data explained in Apple podcasts, Spotify, and Google podcasts or anywhere else podcasts are found and hit subscribe so you never miss an episode. On behalf of the team here at SOAX, thanks for listening.

Read full transcript

Neil Emeigh

Founder and CEO, Rayobyte (formerly Blazing SEO), a provider of an ethical proxy network and a scraping tool called Scraping Robot.

Contact author