Data Excavator

How Selling Motor Oil Led Me To Create A $48K/Year Web Scraping Tool

Vitaly
Founder, Data Excavator
$4K
revenue/mo
1
Founders
2
Employees
Data Excavator
from Tbilisi, Georgia
started April 2019
$4,000
revenue/mo
1
Founders
2
Employees
Discover what tools recommends to grow your business!
platform
hiring
social media
productivity
Discover what books Vitaly recommends to grow your business!
Want more updates on Data Excavator? Check out these stories:

Hello! Who are you and what business did you start?

Hi! My name is Vitaly, and I am a programmer. More precisely - a programmer and an entrepreneur. On one hand, I just love programming (15 years in PHP, JS, C#, C++, MySQL, and PostgreSQL). On the other hand, I hate doing it.

You know, it's not easy to combine the love of programming and the ability to build a business. This is probably because a good entrepreneur shouldn't care too much about what's "inside", under the hood. And a good programmer doesn't have to worry at all about sales and organizational issues. A good programmer stands for quality. And a good entrepreneur stands for big sales and getting results "right now". And you know what? I love prototyping on my own and getting involved in problem-solving on a private level. It's such an interesting conflict.

I've been building and launching startups, both in-house and external, for over 10 years. During that time, I've been involved in the creation of a wide variety of products. Well, for example: a search engine for goods and services, a time management system with a built-in fingerprint scanner, huge x-ray units for airports, GPS vehicle monitoring, QR-code advertising service, an interactive 3D city map, and much more. And now I'm going to tell you about how I started a data scraping app.

It's funny, but I started working on scraper when I was working on a completely different startup: the creation of a chain of high-tech auto-part stores. Today that project is a separate business, with several successful stores, and I no longer manage it. But when I was working on it - I accidentally created another product, the Data Excavator application.

Perhaps here it is customary to talk about some kind of revenue figures. And in general, that is correct. But I will put it this way. What is more important to me is what benefit my product brings to society. And at what cost to the environment it is achieved.

So over the past 3 years with the help of our product, we have collected databases of about 500,000 goods. Today these databases are used by about 5 large companies and a huge number of small companies. Retail customers of these companies and ordinary visitors of the website every day see the data, which was processed by our applications. And every month, these companies sell more than $100 million worth of goods and services. That said, these databases have been compiled as energy-efficiently as possible, and have left a minimal carbon footprint compared to how it would have been done without us.

And while there are many opinions about the ethics of scraping, scraping applications remain a needed and sought-after product. I'm not saying that copying other people's content is fun or useful. But I can say that in many cases, scraping benefits everyone involved in the commodity-money relationship. Without detriment or problems for any of them. So, our story will be just about the bright side of data processing. Here we go!

data-excavator

What's your backstory and how did you come up with the idea?

The story of Data Excavator begins in 2016. Well, it's not the beautiful story where one guy had a million-dollar idea and went to his garage. It's more like the opposite. This is the story of when a standard task got boring to the entrepreneur. The guy had no choice but to automate it.

He solved the problem most simply and trivially, and suddenly it started working. A month passed, then another month, then another six months, and the strange solution to the problem proved useful. People started using it. They started making decent money from it. Do you want to know what it was like?

My main problem as a startup founder was organizing my work time. You know, I'm a perfectionist and somewhat of a workaholic, and I used to always feel like I was underperforming somewhere.

Well, back to the beginning. It's summer 2016, July. It's hot, and dusty, and I want to go to the sea. We are inside the main office of an auto parts chain. Outside the window, workers are digging a trench, and the local squirrels from the nearby park are terrified of the rumble. Inside our office, on the wall hangs a large black chalkboard that reads "don't hurt the programmers, they work as they know how." And it's true – me and my team work as we can, and we work very well. Our task is to create a mega-catalog of automobile oils with over 40 thousand products. By this point, for two months we've been sending Emails to various oil manufacturers. We ask them to send us information about the oils - names, articles, and photos. Sounds easy, right? But it is more complicated.

data-excavator
The very shelves in our warehouse that eagerly awaited oils from suppliers

Almost all of the manufacturers reject us. Usually, they respond in three scenarios. The first is that they don't have the data we want. And the information we want can be found anywhere - just not with them. For example, we can go to their distributors' website and ask for pictures that the distributors took. Or we can go to their warehouse and get a free photo shoot of the products. Or we can ask their customers for pictures. Or we can go anywhere but them to get the data we need.

The second category of manufacturers promises to send data after "a little conversation." Days go by, and the department head, senior manager, senior supervisor, supervisor of supervisor, and God knows who else joins the correspondence. And so, this whole team goes on and on about whether or not we should provide the data. One day follows the next, and the discussion goes on and on, and it seems to turn into an endless game of words. But the fact is, we don't get any data, just more Emails.

The third scenario is that the manufacturer answers with a plaintive "no." For any number of reasons. From a simple lack of time to a principled unwillingness to share data.

All three situations cause our bewilderment. After all, any company which produces something should be interested in the growth of sales. And then we come in and say — "Hey friends, right now we can increase your sales!» But they say to us - no, we don't play that way. Funny? In our opinion not so much! But there were positive things that gave us hope. The common factor that united all the categories of manufacturers was having a website. And it already had some data and some pictures. What's interesting, almost all of the manufacturers were in favor of us copying the data from their websites by hand. Well, that's where it all started.

data-excavator
Hey guys! We want to sell your oils, we even have a shelf for that purpose!

Like real programmers, we decided to automate this task. Well, when it comes to copying 40 thousand product cards, it's no fun. Imagine this. If one product is copied in about a minute, then 10 products - 10 minutes. 100 products take 100 minutes. And so on. And we're not talking about the fact that working on such a task day after day makes a person go a «little crazy». We didn't want to traumatize our minds.

In other words, yes, Data Excavator was created to solve our internal problem. We just didn't want to copy huge amounts of information by hand. At the same time, we could not find this information anywhere else. And that is where the story of our product began. In terms of the financial situation, this startup was something natural. It was already part of the work that we were doing. It's probably more of a story about an internal product that became an external product.

Take us through the process of designing, prototyping, and manufacturing your first product.

Our team started working on a small prototype. And this prototype extracted information from sites based on CSS selectors. What, CSS? What did scraping have to do with it? The trivial answer is that at the time it seemed like a simple idea to scrape data using the document.querySelectorAll() command.

Our prototype was based on Chromium, in a C# + WPF + CEFSharp stack. And of course, we didn't forget to add HtmlAgilityPack and a module to handle CSS selectors. Yes, it was a weird and crazy idea to make a scraper in a .NET environment. But what was there to do? As the saying goes, "I was young - I needed the money". And in the sphere of total lack of time for side-development, we chose the easiest and fastest way of prototyping. It took about a week to produce this unusual product.

In the first version of the app, we just had to enter a list of CSS selectors, separated by commas. Like: #productName, #productImage, .productPrice, and so on. Our program emulated Chromium, to be more specific, it was Chromium packed in WPF via C#. So, the program went to the vendor's website. Then it downloaded all the content page by page and applied the specified CSS selectors to it. The results from each page were saved to a separate folder, a file with the text and a list of pictures next to it. That was it! This fantastically simple design made us laugh on the one hand, and on the other, a fragile hope for a solution to our problems.

We chose 5 target sites from which we decided to scrape data. We launched the app and waited patiently. And oh wonder! - In a matter of hours we were able to collect the database we had been asking vendors for over 2 months! We were sure that it would work very slowly, and that the target sites would block us on the first day. But no! Everything worked almost perfectly. No blocking or time delays. Just data coming into our application file by file.

Of course, it was still raw data. And of course, they were extracted in different formats from the sites of different companies. But it was already something, a small victory! We continued, and day by day our database grew and grew in quality. After about 1.5 months, we already had quite a decent result for our management. That is, we got the database we needed to get. But at the same time, we got an application, which is a real data scraper.

Over the next year, we continued to scrape various sites, and increase our database. And of course, improving the application. At some point we realized that the application was already working too well - so we decided to separate it into a product. So the application got the name Data Excavator, a separate website, and a different team. Why Data Excavator? Perhaps because it extracts data as quickly and massively as an excavator digs the ground with its bucket. Or maybe because we made a prototype of it just at the time when an excavator was digging a trench near our office for several days in a row.

At this point the reader may ask — «lol guys, why didn't you guys use a ready-made scraper?» After all, there were already ready-made solutions on the market at the time. Good question! - We would answer. Probably because we were confident that it is much easier to write a mini-script than to study a third-party solution. Secondly, at that time we had not found any solutions that could handle images and dynamic sites. That is, we needed a non-standard solution and a custom approach. And that is why we made the product.

Probably our biggest mistake was letting the product stay in MVP status for too long. At one point we got a few thousand visitors to our site. But the product wasn't good enough, and those people just left.

Describe the process of launching the business.

Well, right after we collected the database we needed, we just left the issue aside and went on with the auto business. Occasionally we would go back to our product to extract some more data, but in general, we did not use the scraper externally. Sure, we already had a separate website for the scraper project and tested some sales in lazy mode. But overall, we didn't have the necessary charge of motivation to reach the mass market.

That changed in 2019. At the beginning of the year, I wandered around Facebook looking for inspiration and new contracts. At that point, the auto parts business was already fairly stable and didn't require as much time. And so, I came across a group called "Web data scraping." In it, various people were sharing information about what they needed to do, and where to get the data they needed for their websites. And you know, I was surprised by what I saw.

So many entrepreneurs were there looking for a solution to their data scraping problems. Some needed items for their online stores. And someone wanted to collect a database of E-mail or phone addresses to send out. I took a long time leafing through the list of posts and realized it was time to get back to business.

That same day, I wrote to several people in the group that we could help them. And one of those people responded. It was an entrepreneur from Texas who was in the furniture trade. She needed someone who could urgently copy over 40,000 items from supplier websites. Just like we sent emails to oil suppliers in 2016 - just like she tried unsuccessfully to get furniture price lists and pictures of that furniture. And guess what? I took the job on the same day.

I was curious to dive back into the process and see what would come of it. The prepayment from Texas was received that same day via PayPal, and I went to the team and said: "Guys, it's time to revitalize our product."

We tried to work the order from an entrepreneur from the United States at 101% and gave her the best possible technical support. And guess what? Her business just exploded! Her revenue began to grow so fast that at some point she needed additional staff. As a result, in six months, her revenue was up 74%, and her traffic was up 68%. It was also good to know that the average sum of the average ticket also went up. In other words, with just scraping, we were able to qualitatively increase the results of her business.

data-excavator

The funny thing about this story is that the suppliers never sent her the price lists. Half a year later, and then a year later, there was still no response from them. And speaking of moral issues and overall benefits. What would be better, to sit without sales and hope for a bright future, or to get thousands of products on your site today with scraping, and continue to grow your business?

Since launch, what has worked to attract and retain customers?

We use quite a few tools for marketing. Our company is probably at the stage of development right now where we are testing a large number of different hypotheses. But, like every startup, we move consistently, trying to look for tools that work for rapid growth.

The first thing we did after the launch was to post information about ourselves on about 50 different startup sites. There we talked about our product and offered case studies to solve typical problems. The second step was marketing through social networks. We found entrepreneurs with typical problems or a typical business and offered them our services. As a third step, we tried contextual advertising with Google.Ads and a few other things. And finally, we came to writing articles with a focus on SEO, on our website.

An essential tool that we also tried is placement on marketplaces CodeCanyon.com and Codester.com. Codester alone brought us over 5,000 visitors, some of whom became our customers. Even if the customer does not make a transaction through the marketplace, he remembers the information about your product and may become your customer in the future. That's the way it is. And here is a screenshot of our account with Codester. As you can see, not many sales came directly from the marketplace, but a huge amount of traffic.

data-excavator

In my opinion, the best marketing technique is providing quality service. An inspired client brings other clients, offers other projects, and is happy to work with you further. An unsatisfied client always finds an opportunity to refuse work and stop cooperation. That's why the best thing any startup can do is to love its customers and handle feedback from them.

How are you doing today and what does the future look like?

We are currently working as a small team. We continue to develop our product, and help customers with data scraping. Part of our time is spent on improving the software, and part of our time is spent on consulting clients. As I wrote above, when it comes to scraping, it's very important to provide quality technical support. After all, sometimes there are complex and interesting tasks that require special qualifications.

Our business is running at a small profit. This project is not mature enough to have a full-fledged team. But it's also not so simple as to develop it by itself. Therefore, it is safe to say that we are in an organic growth phase.

At the same time, I'm looking for my first investor. And I wouldn't say that I'm doing it with the proper level of diligence. Rather, I'm philosophically watching the market, and looking out for different business angels. I've spent most of my life in Eastern Europe, and I travel just as much now. You know, when the average salary in your home region is $500-$1,000 a month, it's not easy to mentally expand to work in today's venture capital market. Sure, I read stories of raising $500k in the A round with interest, and those stories are inspiring. But for now, I'm only aspiring to do so, and dreamily imagining an investor who would agree to share our journey with us. In the meantime, no such investor has been found - I am funding the project with my own money. With the money, I manage to earn on other existing projects.

What about the market size and customer base? According to various estimates, last year the size of the scraping market was over $10 billion. And you've probably at least once heard of such applications and such technology. Why do you need it, where do data scrapers apply? In a nutshell, you need it for tasks like the one that started our story. That is, for example, to create custom databases. Or for filling your stores with products from suppliers. Or for market research. Or for creating a database of potential customers (lead generation). Or for solving many other important and interesting tasks. For solving not only applied problems but for the multiple growths of the business.

What does our current activity look like? To begin with, it is worth saying who our client is. As a rule, they are representatives of small and medium-sized businesses. These are people who are just launching their projects and need the right data. Some need contacts from Google Maps, and some still need product cards from suppliers' websites.

Initially, the client consults with us and gently asks if we can solve his problem. After a free consultation, we provide a simple set of demo data to convince the client that it works. Well, after receiving the file the payment for the license key does not have to wait long. People who need to scrape data from some site want to get results very quickly.

It is also worth noting that we have also expanded our product line. Originally we had one big "any purpose" scraper. Over time, we realized that this approach is not applicable everywhere. So we made several mini-scrapers. For example, a separate scraper for Google Maps, Amazon, Walmart, AliExpress, and so on. Such scrapers can work with only one site, but they are much cheaper. So, our work is increasing not only upward, but also in breadth.

Through starting the business, have you learned anything particularly helpful or advantageous?

Oh yeah. My main problem as a startup founder was organizing my work time. You know, I'm a perfectionist and somewhat of a workaholic, and I used to always feel like I was underperforming somewhere. It's a strange feeling when you end the day like a lemon, and you go to sleep thinking "Gosh, I guess I'm not good enough, I don't work enough". I tried different services and approaches, but in general, they did not allow me to get rid of these annoying thoughts. So here's what I came up with.

I started a simple xlsx file, and called it "Personal Effectiveness". In it, I added some columns - "Date", "Started", "Finished", "Efficiency", "Project", "Total time", "Comment". And now every time I sit down to work, I mark the start time there. When I finish work, I fill in the "Efficiency" field, and "Finished" columns. With a simple macro, using the percent efficiency box, the file calculates the time I effectively worked. And yes, of course, I fill in the "Project" and "Comment" columns. And you know, it's very motivating. I look at my past days and think, "Boy, am I super efficient." It lifts my morale well and allows me to move forward. So yes, I advise every entrepreneur to have a file like this and use it. Here's an example of my file:

data-excavator

As for good and bad decisions. For me, there is another lesson in this project. Effective business is a story about hypothesis testing and the courage of the founder. It's about how one destroys the old version of oneself and creates a new version of oneself. It's about the ability to break down old ideas about the product and the audience and create new ideas about the product and the audience. While working on the app, we made some major pivots, changing the way we think about our customers and the way we think about our product. It's decisions like these that allow the company and the project to grow, and find new niches and new profits.

What platform/tools do you use for your business?

The first tool I will name is Trello. In my opinion, it is one of the best solutions for startups with up to 10 people. Of course, it can't be called the perfect CRM for organizing an entire business, but it's quite suitable as an entry point for new employees. And definitely as a repository of important information that the whole team needs to know.

The second important decision, is the use of messengers Whatsapp / Telegram. Here it is important to use them not only for internal communication but also for communication with customers. Agree, when you need support or advice, and you can get it right on your phone, it's convenient. You get the feeling that you are being taken care of. That's very, very good for sales.

The third thing we use deserves a separate mention. This is the search for clients through Upwork and Freelancer. Some may say that this is not a serious approach, but in our case, it works. A lot of interesting orders and interesting people work through these sites. Once they have tried our product and evaluated the effectiveness of our team, they gladly continue to cooperate. And in general, data scraping is a rather sensitive issue. Customers need something more personal than the FAQ section on the website.

I also want to mention the CodeCanyon Marketplace. This is a place where you can upload your product and sell it. The guys from CodeCanyon make their product in Australia, but they are very good at promoting it around the world. Thanks to this site, we also get orders from different countries and help all kinds of people.

data-excavator

Working through this marketplace is very easy. You simply add your app, make a nice page with a description, and start selling. Surprisingly, many users find it convenient to solve their problems this way, through the marketplace of small applications. And here I can offer you a tip. If your product is too big or complicated for such placement - try to make inexpensive and simple mini-products on your topic. They will attract a lot of retail customers, who can then get to know your main app.

What have been the most influential books, podcasts, or other resources?

I was most influenced by Henry Ford's biography. His book, "My Life And Work" is a basic textbook on entrepreneurship, I believe. It was there that I picked up ideas for optimizing business processes and real efficiency in building systems. When a business leader counts the steps of his workers between the machines, it commands respect.

When, 100-plus years later, a programmer thinks about the extra clicks of his customers, it's just as respectful. As I create software products, I often think back to this book. And I think about how many steps a customer has to take to get effective results using my application.

data-excavator

The second book I'll call a classic in startup creation. Yes, it's The Lean Startup. Why isn't it in the first place? Because Henry Ford came up with it before. Lean Startup is a modern interpretation of ideas that were applied by Ford 100 years before that. But it is a very, very important, necessary and modern book. Its main message is to try and don't stop. Go through the hypotheses. Take risks. Talk to your customers and make adjustments to the product. Guys, it works!

Advice for other entrepreneurs who want to get started or are just starting out?

The most important thing we realized in working on the project was that people need results. But beyond that, people need quality technical support and human communication. Customers need to solve their problems efficiently, but at the same time, they need to be sure that they will be helped. Only by providing quality advice can you reach incredible heights. Because people will buy from you the product you sell.

Like every startup, we've made a lot of mistakes. Probably the biggest one was letting the product stay in MVP status for too long. At one point we got a few thousand traffic to our site. But the product wasn't good enough, and those people just left. It was painful, but at the same time, it gave us the impetus to keep going. So we realized that people needed the applications we made. But people also need great quality, and the simplicity of the apps, which determines whether or not the customer stays.

Some inspiration. Not so long ago we were approached by a client with a super-urgent task. It wasn't just a task, it was a burning, super-fiery super-urgent task. It had to be done the day before yesterday. And what do you think? All of our competitors turned him down! Just like that! They said, "Well, we'll have to have a better talk about it, and we have plans and a strict schedule of tasks. And then we showed up and said, "We're ready! And just for the fact that we agreed to do the day-to-day task by providing our software, we earned $2500! The client was insanely happy because he made a lot more money on it! The moral is that responsiveness and a human attitude mean a lot. Try to smile at people and go out of your way to meet them. And the world will respond to you!

Are you looking to hire for certain positions right now?

We are currently looking for talented sales professionals. Of course, this may sound trivial, but we really can offer a unique service to our clients. As a rule, the corporate data parsing market works pretty trivial. Well, it's something like - buy our app and don't ask too many questions. We're changing that rule. For us, data is a valuable resource that we treat with care and respect. And if you share these values, it's time to work together!

A short description of a typical task boils down to learning our product well and coming up with crazy ideas to integrate our product with big companies. Yes, we're brazen enough to show up at a big company's office and say: guys, we can help you. We can do something new. And it's going to blow up your sales.

We are willing to offer our agents a percentage of sales, and ongoing monthly compensation if you are successful in your first sales. Since we are a real startup, we work like a startup. Only forward, and only with maximum efficiency.

Where can we go to learn more?

You can find most of the information on our website.

We are also starting to develop our communities, for example on Facebook.

And we are also adding video tutorials to YouTube.

If you want to contact us and discuss data scraping or anything else, feel free to contact us.