GDC 2014: Content Experiments for Mobile Apps with Google Tag Manager

RAUL: So I think that should
be our slogan for Google
Analytics. “It’s so easy to
use even a product manager can
use it.”
What do you think?
should go all the way and say,
so easy to use even
Raul can use it.
RAUL: And that’ll just be out
there for the entire internet.
MALE SPEAKER: Companies around
the world will all look at Raul
and they’ll be like,
there you go, dude.
RAUL: Yeah.
our new GA guy.
So I think it’s
clear to everybody
that in this modern ecosystem
of game development,
one of the most important
things you can do
is ensure that your game runs
across tons of different form
Different screen resolutions.
Different types of
memory profiles.
GPU characteristics and whatnot.
And this is a problem
for all of us, right?
I mean, you have
to figure out how
to write your code once,
but yet distribute it
to hundreds or thousands
of different configurations
of devices.
And that gets really gnarly.
Anyone had that problem before?
Or is it just me?
One dude.
Someone find him a t-shirt.
The rest of you wish
you’d raised your hands.
Anyhow, up next to
describe a little bit more
how to solve this
problem at its core level
with some amazing technology
that you’re going to love
is Neil and Lukas.
Big round of applause.

LUKAS BERGSTROM: Hi everybody.
Great to see so many
people here today.
I’m Lukas Bergstrom.
I’m the product manager
for Google Tag Manager.
And we got Neil Rhodes here,
as well, who’s the tech lead.
And I think we’ve talked a
lot about measurement so far.
We talked some
about advertising.
We’re going to talk
about something
a little bit different now.
So we’re going to talk
about not just measuring
what’s happening.
Or trying to advertise to get
people to incentivize behavior.
Or tell people about things
you’re interested in.
But we’re going to talk about
the core behavior of your app.
How is your app configured?
And how can you control that?
So I think we’re all familiar
with the app store model,
You build an app.
You hopefully get
it through review.
You get it shipped
out to people.
And with any luck, some
reasonable number of people
will upgrade to
your newest version
at some reasonable time.
In the web world, we’re used
to something a little bit
different, right?
We’re used a rapid iteration.
We’re used to being
able to ship something.
And then if we know is that
something isn’t quite right,
we can update things live.
And people who are
visiting our site
will instantly see
those new changes.
And so that’s why
people get addicted
to things like
analytics on the web.
Because they can tweak knobs
and dials to improve outcomes.
And instantly see how
that gets reflected.
And that has a lot of really
positive outcomes in the web
And we’re big believers
in using real time data
to allow you to have a
really tight feedback loop.
That’s harder in the app world.
Once you ship something,
you’re stuck with that.
Until you can convince people
to install a new binary.
So pretty much, once something
is shipped out to the store,
it’s set in stone.
And what you’re measuring in
the app, the configuration
of different values,
so maybe what
the text of an in-app promotion
is, what the text of a button
is, or the frequency
of in-app promotions.
All that stuff is
locked in the code
and shipped along with
the static binary.
So how does this
work in practice?
You’ll notice that this leaves
out a very important part,
which is the actual
development process of getting
a new binary out the door.
Let’s assume that
your guys’ development
process is pretty much perfect.
You’re agile, you’re
you always ship
stuff right on time.
Despite that, however, in order
for you to get changes out,
you need to get
the binary built.
Get it approved
in the App Store.
And then you need
to actually get
people to install
that new version.
And even if everything
goes perfectly,
that means that your
feedback loop, your ability
to make changes, is a lot longer
than you would like to be.
And this affects
not just developers,
but anybody who’s work
ends up in the app.
So let’s say that you’re
a marketer, a business
analyst, who’s putting
together promotions.
And you’re trying
to figure out what’s
the most effective frequency
of showing in-app promotions?
What kind of wording is going to
get people to convert the best?
You need to, basically, ship
that along with the new binary.
And so your ability
to kind of tune stuff.
Measure the effect of that.
And then go back for another
iteration is severely limited.
So the solution is
what we’re calling
Google Tag Manager
for Mobile Apps.
And Google Tag Manager, the
name comes from the web version.
The web and the mobile version
have one principle in common.
Which is, separate out your
configuration from your code.
So the code has to stay
static for app review,
but the configuration
can be dynamic.
That can be controlled
by Google Tag Manager.
And that gives you
a couple benefits.
It means by decoupling
configuration from code,
you can make changes to things
like what am I measuring?
What do I consider
most important?
Or what is the placement
of my in-app promotions?
Where do they appear
in my user flow?
Where they appear on the screen?
How often do they appear?
What’s the wording?
All of that stuff
can be controlled
without shipping
a new app version.
Which means you can push
changes as often as you like.
Just by clicking publish
in Google Tag Manager.
And this affects not
just developers but,
as I was saying, anybody
whose work ends up in the app.
So your marketers
and business analysts
who are trying to optimize stuff
that happens inside your app,
using Google Tag Manager
and decoupling configuration
from code lets those
marketers do all that stuff
without bugging you.
So suddenly they’ve got a
much tighter feedback loop.
A much tighter and faster
iteration and improvement loop.
And you’re freed up to
actually work on feature work
because they can to get
stuff done without you.
So I’m going to ask Neil to come
up right now, and explain you
how this actually works.
very much, Lukas.
So as Lukas described,
what we really want to do
is separate the
configuration from the data.
And one thing I
want to make clear
is, Google Tag Manager
for mobile apps
has been out since
mid-last year.
Although, I am pleased to say
we have a brand new feature
that we’ll be talking
about shortly.
So you can really configure
virtually anything.
What you basically have
are key value pairs.
What sorts of things
can configure?
Anything really you can code.
You can configure add settings.
How often am I
going to show ads?
How often am I going
to show purchases?
Where do those ads appear?
And so on.
Any network communication
that you’re doing.
You might want to
configure that.
What host names
are you talking to?
What are their timeouts?
Game-play values can actually
be configured, if you’d like.
You can configure
content of any URLs
you might use for
help, or so on.
Any UI settings.
Locations of
particular UI elements.
Their colors.
Even your localized text can
be done via configuration.
And any new sort
of announcements
you want to make to users.
Here’s the basic way I think it.
Currently in your
application, you probably
have a lot of constants.
That hard-code how
your application works.
So as software developers,
we don’t actually
hard-code values.
And have twos, and threes,
and magic numbers in our code.
We do have constants.
The problem is to
change those values,
you need to recompile
your app and redeploy.
By moving from using
constants, to instead,
using more dynamic key value
pairs, you get this dynamism.
What we call highly
configurable application.
So you use a Google
Tag Manager container.
And once you’ve
initialized that container,
you can get the current
value of a particular key.
As an example, and this
is just a simple example.
We have here, on the left,
a configuration where
we’re setting, using
JSON, three values–
the text for the
Register button,
the text for the
try button, and also
whether the try button is
on the left or the right.
And we can see as it’s
configured on the left,
we have a try on
the left, and it
says “register” on the right.
If we change that
configuration, it
changes the values
of those buttons.
And also the order
of those buttons.
Now, did you have to
write code for this?
You had actually write code
to go get those values.
And apply them to the buttons.
So change the text of
the buttons, or reorder.
But once you’ve done that,
you have a highly configurable
You can imagine, most
parts of your application,
you suddenly make
dynamically configurable.
This is actually
work in practice.
What you do is,
first off course,
you make that application
highly configurable.
And then you publish
that application.
Then, you go to,
with your web browser, and
you edit the container.
You basically provide new key
value pairs with this JSON.
And that gets stored
within GTM in the cloud.
Your application, as it’s
deployed to the user,
has been built using the
Google Tag Manager SDK.
And it has many of these
calls to load the container.
And it has calls to
get container values.
So to get those key value pairs.
What happens, is that
the SDK will periodically
pull up to Google
Tag Manager, to get
the latest version of container.
That’s, by default,
every 12 hours.
But it’s up to you.
You can also make explicit
calls to refresh more often.
That container, that
gets downloaded,
gets then saved with
your application.
And what’s important
to realize, is
that when you make a call to get
a value for a particular key,
it’s not doing something
weird, like going actually
up to the network to
try and find that value.
Instead, it’s using
a cached version
of this latest container.
And then evaluating that value.
Giving you that latest value.
What happens when you first
start your application now?
Let’s say there’s no
network connection when
the user uses it.
How do we get this container?
The way we solve that,
is that you actually
provide default container that
ships with your application.
We can think of that as the
container of last resort.
So that gives them the default
values for each of your keys.
And then those can be overridden
by editing the container
and publishing a new version.
So what do you need
to use to get started?
You go to Google Tag Manager
and create an account.
You create a container.
You download the SDKs.
So this works for
both iOS and Android.
Currently, there’s a Google
Analytics services SDK,
that combines both the Analytics
SDK and the Tag Manager SDK.
And coming very soon, both
Analytics and Tag Manager
are part of the Google
Play Services SDK.
Which contains the
APIs for almost all
of the Google products.
Then you modify your
app to use the SDK.
Let’s look at what
that looks like.
You initialize your container.
When you initialize
a container, you
provide two important
pieces of information.
One is the container ID.
And that’s provided when
you create the container.
And the second is the location
of your default container
within your application.
And that’s used, again,
when we have not yet
downloaded a container
from the network.
This provides a
container holder.
The container holder
is notified every time
a new container gets
downloaded for the network.
And then, in your code,
you call, get container.
And that will return the
latest version the container.
Once you have a container,
you can then call, get string,
get Boolean, get
integer, get long,
to get values
given a key string.
And then, of course, you go
to work and actually utilize
that value.

There’s more to it
than that, though.
Although Tag Manager
can be used just
to provide global
key value pairs,
it can be also used, more
specifically, for targeting.
You’ve heard discussions
about segments.
We can identify segments, for
instance, of high spenders.
Or potential spenders.
Or people in
different languages.
People using a particular
version of the OS.
People using a particular
version of replication,
and so on.
And we can apply
custom key values
to those using rules
from Google Tag Manager.
Here, for instance,
I’m giving an example
of doing translation using GTM.
So what we’ve set up is,
again, these reg text
and try text, which specify
what the button text should be.
And I’ve set up here,
Chinese strings.
And they’re probably wrong.
It’s whatever Google Translate
said register translates to.
So I can’t verify
their accuracy.
But in any case,
we’re saying, we’re
basically going to use for
strings.regTxt, this two
character value.
If Chinese is true,
and if we look down
at the rule for
Chinese, Chinese says
that the language
starts with a zh.
And that I can
attest to is Chinese.
So we have a set of
predefined macros,
including the language, the
OS version, the SDK version,
and which platform–
iOS or Android.
You can use those, as
well as other custom
macros where you provide values.
So for instance, if you want
to be targeting big spenders,
you know what spend is.
And so what you do,
is write the user
spend to the data
layer, which is
like an in-memory white
board of information.
Meta information about the user.
You write in to that.
And then you can write
rules based on that value.
So if you write
interesting information
to the data layer about the
user– how much have they
spent, how many times have
they run the application,
what level are they
on, and so on– you
can then write powerful
rules for targeting.
We have German set
up a similarly.
And then finally,
we have default,
which, of course, is English.
And here, instead of explicitly
saying we’re targeting English,
we’re basically
carving out everyone
but Chinese and German.
And the reason for
that, if the user
has set up to their
system to be in Spanish,
we want to have some values.
And we’re going to go ahead and
revert to the English values.
But the nice thing
is, if we want
to start supporting Spanish,
we don’t redeploy our app.
We just go to our container,
add a new role for Spanish.
Within the next 12 hours, all
the users will be updated.
And away we go.

So we have this ability now.
We can set values dynamically.
So go up to the
website and change them
if you want to change them.
That’s great.
And we could do that
on a targeted basis.
The question is, what’s
the best setting to use?
What’s the best setting to
use for a particular register
button in order to increase
the number registrations?
What’s the best setting, as far
as how often you should present
offers to users in order
to maximize in-app revenue?
What’s the best wording
for those offers,
to increase the revenue?
How often can you show ads
without affecting engagement?
If you show them too
often, people may give up.
If you don’t show
them often enough,
you’re leaving
money on the table.
And the best way to do this,
really, one thing is just
Try one value in then
later try another value.
That has two problems.
One problem is, there might
be exogenous factors that
are affecting things.
So it might be
that this week, you
have a higher in-app
revenue than last week.
For some outside reason.
And the fact you changed
the text of your offer
didn’t have anything
to do with that.
Also, it’s slow to be
making these serial changes.
So what I’m really
excited to announce today,
is experimentation for
mobile applications.
For both Android and iOS.

Just a little bit
of a background.
So I came from
experimentation on the web
for Google Analytics.
And when we first started
this project of Google Tag
Manager for mobile, this
was how can we actually get
experimentation for
mobile applications?
What can we provide?
There’s nothing like
redirecting, for instance,
to another URL in apps.
And so Google Tag
Manager for mobile Apps,
really, in my mind, was a
way to get experimentation.
It turns out, it allows
a lot more along the way.
But this gets us to a
goal I’ve always wanted.
You can run experiments.
On either all of your users,
or a targeted subset of users,
again, using these rules.
When an experiment ends, you
can lock in a winning variation.
You don’t need to
change your application.
So as long as you’re
using Google Tag Manager,
and your application
is highly configurable.
In the way we’ve discussed.
And also supports
Google Analytics.
Then, experimentation
just works.

Within analytics, once you
have run an experiment,
or as you’re
running experiments,
you can actually
segment your users
in the way Mike was
describing in the last talk.
Except, another access of
segmentation you can now do,
is by experimenting variation.
How do people who got exposed to
one variation in an experiment,
how did they differ
on whatever metrics
you’re interested in looking at?
Versus users who are exposed
to different variation.

And what I’d like to do
now, is show you a demo.
So I have a game here.
In this game– well, it’s
really not much of the game.
Right now it just has the
try text and the register
text in the way I showed before.
But imagine it’s a
really good game.
Now what I’m
interested in doing,
is modifying the register text.
So we can see here there’s
a value collection.
A value collection, again, is a
collection of key value pairs.
And I want to be
modifying the register
text to try some
different values.
So I can now create
an experiment macro
from this collection.
And let’s look at
what I’m going to do.
I’ll give it a name.
It’ll be the
register experiment.

In order for
experiments to work,
you need to be using
Google Analytics.
This completes a
round trip, where
we provide a different
variation to each user.
But somehow we have
to capture that.
And then see how the behavior
is different in the future.
And so we capture
that by reporting
into Google Analytics.
So you need to link a
particular container
with a particular Google
Analytics property.
We’ve already done that and set
up on the Acme game property.
Properties in analytics
have one or more views.
I’m selecting a view.
And then here’s the key here.
When you’re
experimenting, you’re
not just randomly
experimenting, like,
what are all the things
that might change
if I expose an user two
one version of something.
Versus another
version of something.
But really, you have a
particular objective in mind.
And that, you specify
for the experiment.

Neil talked about reporting your
transactions in in-app revenue.
And here’s one important
reason, so that you can actually
optimize in-app revenue
using experimentation.
So you can choose that you’re
optimizing app revenue.
And then we’ll determine
which variation is better,
depending on which is giving a
higher average in-app revenue.
Having experiment on is
some form of engagement.
How many screens
are they going to?
How long are their durations?
Crashes and exceptions–
so part of what GA provides
is the ability to report
back crashes and exceptions.
You may have a
hypothesis, for example,
that on this particular
device, you’re
seeing a lot of exceptions.
Let’s say an OS version.
Maybe the problem is that you
have a timeout for a network
call you’re making
that’s too short.
So your hypothesis is that
increasing that time out
will reduce the
number of crashes.
You make an experiment.
You make its
objective be crashes.
You’ll be happy to know that
when your objective is crashes
or exceptions, we actually
look to minimize that metric.
Rather than the maximization
we do in the other metrics.

As well as the
pre-existing metrics
on which you can
optimize, we also
allow you to optimize
on your own metrics.
So these, in Google
Analytics, are called goals.
For a particular view
in Google Analytics,
you can set up a goal.
And that goal can be one
of a variety of things.
As was discussed in
the last session,
you can provide events, which
include a category, an action,
a label, and a value,
to Google Analytics.
And then set up a
goal, based on any
of the settings of those
categories, actions, labels,
or values.
I have already set up, for my
Google Analytics view, a goal,
which is registered.
So when the user registers
in my application,
I go ahead and send an
event via analytics.
That is what I’m
trying to optimize.
So I’ll clearly be
able to see, when
I change the text of
the Register button,
how does that affect how
often the users register.

So I’ve got my original.
And this is just the
name of the variations.
I’m going to name
the original register
and the variation register now.
So then when I’m
looking at reports,
I can more easily remember
what that actually was.
And my original is going
to provide the register
text that just says “register.”
And then my variation,
I’m going to modify that.

This is my variation.
This is my original.
So let’s make the
original say “register.”
And the variation
say “resister now.”
We’re not limited
to one variation.
We can add up to 10
variations that we want.
I just going to limit it
to this for the moment.

For an experiment, there
are some advanced options
worth looking at.
You can control what
percentage of users
get included in the experiment.
By default it’s 100 percent.
But you can reduce that
to just some small subset,
if you’d like.
A confidence threshold for
the statistical significance,
when we declare a winner.
How long the experiment runs.
Often, users exhibits
some sort of behavior
that differs from day
to day in the week.
So for instance, weekend days
may be different from weekdays.
We wouldn’t run an experiment
for just two days if that
happened to be over
the weekend, and then
generalize that
across an entire week.
So by default, we
run an experiment
for at least two weeks.
You can reduce that if you want.
When the experiment
ends, we’ll automatically
serve the winning variation.
So we’ll lock in that
winning variation.
Although if you want to just
revert back to the original,
even if that’s not as good
as your winning variation,
you certainly can.
And then finally, we’ll
add an enabling rule.
Remember, this was just for
non-Chinese and non-German.
So we want to run always,
except in Chinese and German.

So we created this experiment.
What do we need to do to
actually make it work?
We create a version.
And then we publish
that version.
When you publish a version, it’s
going to start an experiment,
and so we go on and publish it.
It starts.
What’s going to happen now?
So once this publishes,
as devices check in
to receive the latest
version of the container,
they’ll now receive
this container
that has experiment
information in it.
When the user sees the
buttons, your application
will be retrieving the
And as it retrieves
that, it’ll actually
to do a coin flip
to decide which
variation they are going to see.
They’ll see that variation.
We’ll report that information
back to Google Analytics.
Anything from then on that gets
reported to Google Analytics,
that information be tagged.
That this user
saw this variation
of this particular experiment.

And it’s also important to know,
every time the user comes back,
we’re not going to
re-flip that coin.
Once they’re in an
experiment, and been exposed
to a particular variation,
they stay with that variation.
So they have a
consistent experience.

All right.
So let us look at this
experiment in Google Analytics.

So the fact is, since we
just started it running,
we actually don’t
have any users that
have been providing
information yet.
So it’s a boring
report at this point.
But this is where
we would see how
users are doing, what
their conversion rates are,
and so on.
Let me show you a report
from another application.
So here, you can
see our original,
our various variations.
We see how many visits and
conversions for each one.
The conversion rate.
And then a comparison
to the original.
Here, variation
one is doing great,
compared to the original.
So 234% better.
Almost two and a
half times as good.
It’s probability
of out-performing
the original, 100%.
Why haven’t we ended the
experiment yet, though?
And declared that a winner?
Because we’ve only actually
got two days of data.
And we still have a chance
for, perhaps, another variation
to do better, or if
during the week variation
one does really poorly,
maybe even the original
will catch up.

At this point I’d like to
turn back over to Lukas.


teams did a great job.
And it’s a really powerful tool.
If there are just three things
I could ask you to walk away
remembering, though,
they are, well,
not necessarily the bullet
points on this slide,
but more or less.
Separating configuration
from code is really powerful.
So being able to push
new configuration to all
of your users at anytime without
needing to ship a new binary,
is a really powerful tool.
And it can change
the way you think
about how you can optimize
your app, and when.
And what should be
static, versus what
should be dynamically
And we give you a really
powerful rules engine
to allow you to control
that configuration as well.
Beyond that, we know we didn’t
show a lot about the Google Tag
Manager UI, but
we give you a lot
of tools to help teams
collaborate safely.
So that you could have
multiple people editing rules
or configuration in your app.
And then have control.
Say, maybe this person
can edit things,
but only I am allowed to
actually push changes out
to my app.
And what that does, is
it unlocks the team.
It lets multiple
people collaborate.
It lets people like marketers
and business analysts
get their work done without
bothering developers.
But it still does it
in a really safe way.
That makes sure
that you can control
who can make changes
within your app.
And finally, being able to
create these experiments.
Push them.
Get results.
And then a lock in
a winning variation.
All without shipping
a new binary,
is just an insanely
powerful tool.
It’s a really kind of
dizzying to think about.
If you’ve done a good job
of instrumenting in your app
about the stuff
that you care about.
The stuff that you think
should be configurable.
To think about, what
are the outcomes
I want to optimize for?
What are the variables
that might contribute
to success or failure
against those outcomes?
I’m just going to plug those
into the experimentation
engine, and just let it figure
out what the right outcome is.
So being able to just push
a new experiment like that
at any time is just an
insanely powerful tool.
And we’re really excited
to see or people do with it
in a mobile app context.
So as no surprise, we’re Google.
We think more data is better.
And testing beats guessing.
And I think what we’ve
shown you is a tool that
unlocks your ability to do that.
Your ability to conduct
measurement configuration
changes and experimentation.
So we’re super excited about it.
Neil and I will be
outside, if anybody
wants to talk to us
right after this talk.
And we’ll be on the
expo floor at our booth
on Wednesday and Thursday.
So thanks–
thing, it is Google.
It’s free.

Thanks, everybody.

One Reply to “GDC 2014: Content Experiments for Mobile Apps with Google Tag Manager”

  1. I have a question – if I have a Value Collection macro with say {'buttonText':'Order now'} which is set to run always.

    How does a Google Analytics Content Experiment based on the same string 'buttonText' work? I only want the experiment to run on 10% of my users.  Do content experiment macros override Value Collection macros?

    Or do I need some sort of exclude rule that says, don't use the Value Collection macro if the user is in included in the experiment?  If so, how do I go about that?


Leave a Reply

Your email address will not be published. Required fields are marked *