Using Third Parties to Backfill User Data

By Jacob Cohen Donnelly November 10, 2023

In media, being able to break down your audience into specific segments increases your ability to monetize. For example, if you are running a general marketing publication, the major advertisers will be companies that look to sell to the broad marketing community. The problem for smaller advertisers—or more niche ones—is that their advertising budgets get wasted if they can’t serve the entire marketing community.

But if you can break down the marketing audience into specific segments, such as social media marketing, SEO, email marketing, influencer marketing, etc., you can start to find advertisers for each of those. Instead of 100,000 marketers, you now might have 25,000 readers in each of those four categories. Those smaller advertisers become much more likely to spend with you and it becomes more likely that their campaigns work.

If you’ve read AMO for some time now, you know the power of these audience segments. And you also know that the way to build these audience segments is with 1st-party data. However, one of the problems with attempting to capture 1st-party data is that it takes time. That’s why the best time to start capturing it is when you started your site, the second best time to start is yesterday, and the third best time is today.

One way to speed up the process is to use a third-party to help backfill it. There are numerous data enrichment tools out there that have built massive databases of user data. The one that I’ve used in the past is ZoomInfo. It captures user data a number of ways, but the big one is with a community of contributors. In essence, sales people agree to give ZoomInfo access to their email inbox in exchange for access to ZoomInfo’s platform.

It’s very easy to see how both parties win here. The seller gets access to ZoomInfo’s deep database of information. And ZoomInfo gets to see the signatures of everyone who emails the seller. For example, if I had ZoomInfo running in my inbox and someone emailed me, that person’s signature information would be sent over to ZoomInfo. And so, ZoomInfo has an army (community, they say) of people that are effectively allowing their inboxes to be snooped on to capture this information about people.

Once ZoomInfo has that information, anyone who pays for the platform can then get the data. This is beneficial for a number of reasons, but the simplest is that so long as the community stays strong, the database should also stay strong. And the database should stay current. For example, if I work at one company and then change to a different company, theoretically, ZoomInfo will get that information at some point.

I should caveat here that ZoomInfo is not the only option. Apollo has this information. So does Clearbit. And there are many others out there as well. I don’t know which one is the best, but you have to imagine that the stronger the brand, the stronger the community of contributors, and, therefore, the stronger the data. You may want to test out different options.

So, we know how these tools work, how should we be looking to use them?

Let’s assume that you have a database of 100,000 people and you have neglected to capture information about them since you started. You now realize that you need it, but know that it’ll take a long time to get the data. You might partner with a ZoomInfo to get access to the data and use its enrichment tool.

You’d take those email addresses, put them into the platform, and then let the tool match the email addresses to people that it knows. And then you’d take that data, import it into your audience database, and you’d now be able to create segments. Now, I make it sound much easier than it actually is. There are a couple of things to consider.

First, these tools can get unbelievably pricy. I’ve heard of some ZoomInfo contracts where the cost per data point is $0.27. In other words, if you wanted to get the person’s name, company, job title, and company industry, you’d need to buy at least four data points. That’s $1.08 for that one subscriber. But what if you want even more data? You can start to see this really get expensive as you expand into hundreds or thousands of subscribers.

Second, many email addresses cannot be matched. It does a pretty good job with company domains, but it is almost useless when you’re inputting a Gmail, Hotmail, Yahoo, etc. Some tools say that they have consumer databases and can match against these, but I find it to be much more hit or miss. And so, if you only have these free email addresses in your database, you’re going to struggle.

There’s a good way around this, though. If you have some information on the user, it increases the likelihood that you might get a match even if it is a gmail. For example, if you have their first name, last name, job title, and company, you might be able to get a match and then get additional data that you want about that user. In essence, you come to the table with a partial profile and you can then get a full one from these tools.

And this is where these tools can become rather interesting. If you are progressively capturing data about your audience over time, you can use the tools to help backfill information that is pertinent to specific advertising campaigns. For example, if an advertiser comes looking for a specific sub-industry that you might not have originally captured data on, this third party data can help you find it. You can start to become more opportunistic with your data capture. Just as importantly, you can use your advertiser’s money to help enrich your data.

But I will caution… you are depending on a third-party tool for your data and accuracy is not perfect. Perhaps ZoomInfo has a user’s old company and it has not updated. Or maybe the job title is just flat out wrong and the information in your database will not be as accurate as you might want.

One way to reduce the risk of bad data being introduced into your system is to set a flag for recency with your users. If, for example, a piece of data about a subscriber was updated less than six months ago, you may want to leave it alone even if the platform has different information. You might also want to never overwrite anything the user gives you and simply use the third party to enrich data that you already have.

Nevertheless, whatever rules you do come up with, you are introducing imperfect data into the system. And so, this is something you need to get comfortable with.

And this brings us right back to the beginning of the piece. The way to minimize the need for these tools—or to use it only for enhancement rather than origination—start collecting 1st-party data immediately. It takes time to get going, but if you start now, over the years, you’ll have a database of declarative data that you can then sell against.


Thanks for reading today’s AMO. If you have thoughts, hit reply or join the AMO Slack. And speaking of 1st-party data… Tell me more about yourself so I can continue delivering the right content to you. I hope you have a wonderful weekend and I’ll see you next week.