Capture The First-Party Data That Will Actually Help

By Jacob Cohen Donnelly

The media industry has been making a pivot to taking ownership over user information and striving to collect more first-party data. In numerous conversations, people have asked what CDPs I think are right, what types of data to capture and how to go about doing it. These are all the right questions and it’s refreshing to see publishers recognizing this importance.

But before you jump in and just start capturing data blindly, it helps to take a step back and figure out what information might be important. What you don’t want to do is spend a year capturing information only to realize that it’s actually completely useless. Take it from me, that’s a disheartening exercise.

There are two broad buckets of information that are important to take into consideration: personal and consumption.

Personal information is data specific to that user. This is who they are, where they are, where they work, etc. When most people think about first-party data, this is what we’re talking about.

Consumption information is data specific to what that user has done on the website. This could be anything from the stories they’ve read to the whitepapers they’ve opted into downloading.

One bucket of information without the other is suboptimal. At the core of it, you want to be able to pull people that fit a particular demographic and then segment further to people that are interested in a specific topic. However, before we start collecting any information, let’s discuss a major prerequisite that’ll save you from many headaches.

Structured data

At a previous company, I was tasked with coming up with a strategy to start qualifying who our users were. We had traffic, but we wanted to know more about who they were. It was the right question to ask, but the implementation was wrong.

We asked registrants to tell us their first name, last name, job title and company name. On the surface, that looked pretty good to me. When users would see the registration page, they’d be presented with a few text boxes for them to tell us the information.

It wasn’t until I was actually looking in the database that I realized how badly I had screwed up. Do you know how many variations of CEO there are? Let me list a few: C.E.O, CEO, Chief Executive, Chief Executive Officer, and then all sorts of misspellings because people are typing quickly. Now do that for the entire c-suite, VPs, directors, etc.

The data was useless. How could I ever use that for any sort of marketing campaign? What I had done is created a database of unstructured data. As the name suggests, this data doesn’t have a predetermined model to work against. Typically, unstructured data contains a lot of text boxes where the user can put whatever they want.

The problem here is that because of the derivations of the same word, it’s impossible to search the data.

What you need, instead, is structured data. In this case, all the possible answers for a specific data field are pre-determined. Rather than allowing a user to put in a derivation of CEO, they would select it from a dropdown. There would only be a single CEO option, so everyone that fit that title would select it.

Now when you went into the database and segment based on CEO, you’d find everyone without having to create additional versions of the segment.

If you’ve started down the path of capturing unstructured data, do yourself a favor and fix it. You can convert it all into structured data in a manual way (which does suck), but once you do that, your database becomes much cleaner and easier to manipulate.

Now that we’ve set this prerequisite, let’s dive into the two broad data buckets…

Personal data

This data is pretty straight forward and is what most people think about when they are collecting first-party data. If you recall above, I was collecting first name, last name, job title, and company. That’s personal information.

But not all personal data is created equal. Put yourself into the mind of an advertiser and try to determine what they are going to need for their campaign. Or, put yourself in the mind of your marketer. What do they need?

Every media company is different, but for the sake of example, I’ll look at b2b media companies. There are three data points, above all others, that matter. They are company type, job level, and job function. First name, last name, company name, etc. are fine to collect and marketers like to have it, but if I could only have some data, it’d be the three I just listed.

This data gives you a pretty clear window into what that person does. You could pull up a record, see those data points, and say, “this person is a [job level] that does [job function] at a [company type.]”

For example… Someone could be a director of audience development at a b2b media company. Their job level is director, their function is audience development, and they work at a b2b media company.

When an advertiser comes along looking to target those types of people, it becomes much easier to do it rather than them telling you they’re “Director of Audience Development” in a text field.

There are a variety of other data points that you could strive to capture and, like I said, each media company is different, but if you have these, you’re in a good place.

Consumption data

It’s one thing to know who people are, but if you are also able to identify what those people are actually consuming, you’re in a very good position. When you get pitched on CDPs, this is a big part of what they are offering you: the ability to track what users are doing.

It’s important to have your content set up correctly to benefit from this. Your tagging system is at the core. How do you classify content? If you’re haphazard with it, using derivations of the same word for tags, then it’s not going to be helpful. Like I said above, structured data matters.

So, first things first, do an audit of your tags and make sure that you’ve got order to the mayhem. To take it one step further, it helps to create a taxonomical structure where you’ve got tags nested under others. These exercises are always hard, but the more work you have done up front, the better the data will be for you in the future.

Let’s build on the director of audience development at a b2b media company example from before. What we’re now looking to do is see what that person read. Looking in the database, we see that she read a story from the category “audience development.” To take it deeper, her profile now has the tags “SEO” and “email marketing” assigned to it since she’s ready stories about that.

Now you can say that you have a director of audience development at a b2b media company that reads about SEO and email marketing.

That’s a lot of information to know about someone. If you were holding a webinar about SEO for audience development professionals, you could easily identify those people in your database.

Over time, that user’s profile will evolve. Perhaps they start consuming other content on the site. You should be tracking all of it at all times so it becomes easily usable for your team.

It starts today

The problem with capturing first-party data is the best time to have started was yesterday. Since we can’t go back in time, the next best time to start is today. Why? Every user coming to your site before you’re capturing data remain anonymous and you can’t target them and sell smarter advertising against them.

But it can also be a daunting exercise trying to capture all of this information all at once, especially when the team is lean. Where do you begin?

In my opinion, the consumption data comes first if you’re using a CDP because you can start collecting information even before the user is known. This doesn’t help you as much today since your means of communication are limited, but once you do convert them into a known user, you’ve now got a nice amount of consumption data.

Imagine taking the opposite approach where you don’t start collecting consumption information until they actually become known. You might have users on the site, but you know nothing about them. That doesn’t help anyone.

The most important thing you can do is start. Even if you’re not 100% sure what you’re going to do with the data in the next 1, 3, or 5 years, it’s still important to put a strategy together. You’ll never be able to capture information on yesterday’s users and building up a scaled first-party database takes time.

But if you do it right, you’ll know who your readers are and what they are interested in. When we talk about first-party data, that’s what we care about.