A Data Story by Erik Driessen

Google's view on (H|h)eaven

Technologies are all around us. Their inner workings are often hidden to users. The search engine is one of those technologies.

This page explores a subsystem of Google's language model. It's the system that determines the sentiment in a piece of text. The subject of this page is Google's view on the words heaven and Heaven.

Setting the stage

The visualisation below sets the stage for the exploration. It shows you how far apart the sentiment of heaven and Heaven are according to Google.

The horizontal position represents the full range of possible sentiment scores. These scores range from -1.0 to +1.0. Something that appears left of the central line, is considered negative by Google. Things on the right are considered positive. The vertical position reflects the difference in magnitude of the sentiment score.

Let's have a look:

Wow?! That is quite a big difference, right? Both words appear in the right side of the area, which tells us that both words are considered positive. But as heaven is further to the right, its sentiment is more positive compared to Heaven (with a capital H).

How did we get here?

I teach a course on data art. The course shows students how a creative combination of coding, data analysis and visualisation can lead to interesting results. In this course, I use the Google Cloud Natural Language API to transform text into sentiment data.

I explain this technique to the students by showing the results of this API for several lines of text. For some texts, the task is easy.

Take the next line:

I hate this!

You can easily state that this line has a negative sentiment. But it gets harder to guess what we think about more complex sentences, for example lines from music lyrics.

Have a look at this line from Avicii's song Heaven:

I think I just died and went to heaven.

It is not as easy right?

When I was preparing the content for my course, a question popped up in my head:

Would capitalizing heaven have an impact on the results?

I thought it would be a fun experiment. One that could give me some insights on how Google's language model works.

What is Google's view on heaven/Heaven?

If you continue to scroll down, you'll explore Google's view on heaven. It is based on various pieces of text that include the word heaven or Heaven.

After showing the results, I'll share my thoughts on what they mean.

"I think I just died and went to (H|h)eaven."

Alright. If you think you died and went to Heaven, it's bad news according to Google. But if you went to heaven, the sentiment is good?!

"I think I just died."

Analysing a line like this helps me gain a bit of trust in the API. At first glance, this line should convey a negative sentiment. And the API reflects this. Even though you could argue that you are still able to think, an indication that you are still alive in some form.

"I went to (H|h)eaven"

Okay. Apparently, Google thinks the concept of going to heaven is quite a bit more positive than Heaven. It actually returns a neutral value (0) for sentiment for the latter.

"Death"

Just a small check to see what Google thinks of Death. It has a negative sentiment alright.

"(H|h)eaven"

This gets us back to the main two words: Heaven and heaven. Both have positive associations. But heaven more so than Heaven.

So, what is Google's view on heaven?

Well, that is hard to tell. The analysis shared here shows us that something as simple as capitalising a word can have impact on the results. You might wonder why that is the case.

It is not that Google employees, on average, think this differently about heaven and Heaven. But it is good to know that Googlers decide how to train their language models. They select the labelled pieces of text for that.

It works like this. Let's say we have this line of text:

I love my job.

It is fair to say that this line is a positive line.

A language model is trained using a huge collection of texts that includes labels for, among other things, sentiment. These trained models allow us to estimate the sentiment of a new piece of text: is it negative, neutral or positive?

The example on this page shows us that lowercase heaven has more positive associations than uppercase Heaven in Google's language model. When running a search query, this might result in more positive results if your query contains heaven instead of Heaven, regardless of your intentional use of it.

So it's not that Google employees think this way about heaven. But they do select the pieces of text they use to train their models, and these could, unintentionally, be more positive about heaven than Heaven.

This is just an example of how a simple capitalisation may impact how Google's model works for you. Always keep in mind that when you use an algorithm, the beliefs that influence it are not always visible on the surface.

Regardless of their visibility, they are there.

Data analysis was done using the Google Cloud Natural Language API. Results were combined into web-friendly JSON data using my hands. Data visualisations were made using d3.js (v5). Storytelling interactions use the scrollama.js library.

Want to say hi? You can do so here.