18 questions about the future of voice, answered

Nic Newman, senior research associate at the Reuters Institute of the Study of Journalism, answers pertinent questions from industry professionals about the implications of voice on the future of news.

By Mutesa Sithole | Content discovery | Dec 6, 2018

As part of the first major study of news consumption on voice-activated devices, Reuters Institute discovered that around half of smart speaker users say they use the device for news, yet only around one in five (21% UK, 18% US) use the news briefing functionality daily. And whilst some news publishers remain to be convinced about the need to invest heavily today, most believe that voice will significantly affect their business over the next decade.

To help inform your strategy around voice, we’ve compiled a selection of burning questions from a variety of senior editorial and commercial professionals in news publishing for Nic Newman from the Reuters Institute for the Study of Journalism to answer:

Q1. If you were working back in a news publisher today, what would be your strategy for voice?

It absolutely depends what kind of publisher you are. If you’re a broadcaster, I think you have to get involved right now, and you have to make sure your audio streams are findable – and that’s not obvious because you have to do quite a lot of work. At the moment you’ve got a whole lot of aggregators on the platform (such as iHeartRadio or TuneIn) so you’re almost being double disaggregated in that process. Ultimately, Google and others will develop their own version of that, such as Google podcasts, which may make things a bit easier. But if you’re a big broadcaster with brand power, you can negotiate directly to make sure all your stuff is pre-installed. If you’re a small broadcaster it’s obviously harder to get that traction. And if you’re a newspaper organization then I’d take time to think about where the opportunity is for you and focus there.

Q2. Will there be a means for independent publishers to reach audiences through voice-activated speakers directly?

I think for ‘news’ news, it’s going to be the big organizations that people are aware of. But of course news publishing is such a huge area so like the everything else in the internet, what you’re really doing is reducing the cost of production and distribution. The cost to create audio content is very low, unlike video content. For instance, something I use the most is a two-minute Flash Briefing called ‘Smart Speakers’ created by Peter Stewart. It’s basically a load of links read out in an engaging way, covering everything I need to know about voice for the day and it’s fun! Similarly, local and hyper-local sites can create stuff locally, however I think the tricky part is making sure that content is discovered.

Q3. What is differentiated audio content?

It’s stuff that is not the same as everyone else is doing! The challenge in with this area is getting people’s attention. If you’re offering something unique, different and valuable, and it is discoverable, then it will be used.

I spoke with the New York Times and this approach is the heart of their strategy. They spent a lot of time thinking and doing research to understand the core user needs of voice tech. They’re looking to find two or three areas where they can add value and build on their journalistic values and core mission to produce something different. If everyone just does the same, we’ll create excess supply for a limited amount of attention. There are so many opportunities we are yet to uncover.

Q3. Can you elaborate on the idea of metadata in a speakable form? How is Google encouraging publishers and are they rewarding content that includes search optimized metadata?

For the most part, these platforms are living off common standards that are defined by groups of people, including platform representatives. The speakable schema is one metadata field that anybody can put in their HTML webpages. In this case, it gives signals to the platform companies to respond to those particular questions. The metadata may be interpreted slightly differently by each platform, but the point of a speakable schema is that you enter information relevant to a page that gives people a valuable answer, and then the platforms can create services on top of it.

Google’s released its schema in beta and is encouraging publishers to start populating that, so as voice search will be a huge part of Google’s business, it will be interesting to see: Are they going to pay a small number of publishers to create answers? Will they just expect people to just do it? And, if they don’t pay, will they get the quality they need?

Q5. What is the monetization potential of news-driven content on voice-activated products?

As smart speakers and voice tech enhance the linear broadcast model (live or podcast), the amount of audio consumed will increase. Therefore, anyone producing anything over five minutes and catching audience attention will be appealing to advertisers, particularly if there’s data that shows who those people are and that they’re loyal users.

Amazon provides a basic dashboard and Google is planning to improve its tools for publishers to provide a range of data points. Some aggregators like Acast and TuneIn provide data for podcast (and live radio) usage across mobile desktop and smart speakers so it is possible to get some data that way. Most publishers complain that they aren’t getting the data they need on frequency of usage or on demographics. The data piece is very important, and the standards around that will improve.

For the audio market to grow, it’s partly about awareness, partly about data, partly about distribution, and partly about the advertising industry. So I think some of the advertising is quite good but won’t necessarily carry some of the premiums that high-value drama or video have. Platforms are trying to make sure there is a path to premium. It’s hard to see how short-term search can be monetized without some sort of platform payment.

Q6. What effect is voice likely to have on news consumers’ habits of ‘living in a bubble’?

I think the risk is generally overblown because people are likely to use more than just voice for search. Firstly, we might get fewer choices through voice interfaces but they’re not the only interfaces that’ll be used. Platforms are thinking about this. For example, if I ask Google or Alexa a question, I might get one answer back through voice, but when I go to my Google assistant on my smartphone I might see a full range of results.

Secondly, there are different types of queries. ‘What’s the weather tonight?‘ is not really a bubble issue. ‘What’s the latest on Brexit?’ raises a legitimate question about the perspectives you might receive –– and that’s where platforms are thinking hard about how to ensure choice. The hope is to create more conversational experiences, so if I say ‘What’s the latest on Brexit?’ it’ll give a response from your favourite brand, then you say “OK, tell me what the other people are thinking?’.

Q7. Do you think there will be a time when the big three (Amazon, Google and Apple) will share metrics, so we can work out what works when?

This a problem all publishers face, trying to get a single-view of how their content is performing through comparable data.

It’s not just metrics, it’s also the calls to action that are a challenge. When you ask for things, they’re subtly different on all three platforms which makes it hard to execute off-site discovery. You can’t then put on your TV channel, for instance, ‘ask for the news in x way’ as you’d need to create six or so variations (if you include Cortana and Bixby and so on). In some way we do need more standardization on how to ask for things, particularly around news.

Q8. How do you see voice search affecting the structure of news stories, if at all?

Writing for radio is different from writing for online or print. So, most attempts to read out your text article do not work (there are exceptions such as the Economist, 10% of whose readers like having their articles read to them!). The informality of podcasts and the way radio works is just fundamentally different so you have to create content in a suitable way.

Q9. What do you suggest to do in terms of improving SEO for news content?

Optimize the findability of podcasts and live radio streams. That means ensuring you have all the variants or synonyms mapped if you have your own skill or action (which is like an app). Bigger providers can also negotiate with the platforms to ensure their skills, with the synonyms are preinstalled, which provides more control.

Other areas that will become important will be generating news answers using the speakable schema or other metadata that the platforms use to generate answers and also specific areas like recipes where it is important to follow metadata standards to allow speakers to find the different elements (ingredients, method, steps etc).

Q10. Can you tell us a bit more about the different times of the day and where people seem to be in need of in those different moments?

article 18 questions about the future of voice for news answered.html 1

Q11: Any intelligence on time spent with voice platforms?

Not really – just figures that individual publishers have (and have been prepared to share).

19% of NPR live ONLINE listening time is now via smart speakers. For podcasts the figures were much lower – perhaps 1 or 2% of podcast listening coming this way, though time spent is notoriously difficult to measure via podcasts.

Q12. What are the most successful media projects you’ve seen on voice-controlled devices? And from a business perspective?

It’s a great question, but I was pretty underwhelmed during my research! So were many of the users we interviewed. In terms of voice experiences, we’re only just starting to see some. The platforms are starting to work on this, for instance Google will be paying for the Guardian Lab voice experience project –– which is an open project so they’ll be blogging regularly their experiments and get feedback. This is really for the benefit of the industry.

From a business perspective, organizations such as National Public Radio and Swedish Radio are leaders in thinking about how atomised content should work in the future, how you can personalize and re-aggregate. Which makes sense as both companies are purely thinking about audio. If you’ve ever seen the NPR One app, they’re taking that seamless approach and transferring it across.

Q13. The report shows that user demographics skew older than you might expect. What about gender?

Voice devices in general are pretty well balanced in terms of gender in the US and UK, with slightly more use by men. News on these devices is much more heavily used by men, especially ‘lean forward’ activities like asking for the news. Live radio and podcast listening via smart speakers is pretty much equally split between men and women in the UK, though men tend to listen more to both in the US.

Q14. Do you think short opinion pieces would be suitable for such devices when a user asks “give me an analysis of this topic”?

It’s an interesting question and we don’t have much evidence on which to base an answer. People said in focus groups that they might like to get deeper information on topical issues, but they often say things like this in focus groups. It didn’t come up spontaneously as a topic. My sense is that this is unlikely without some kind of prompt or as part of a wider flow. Some publishers are exploring the idea of offering ‘more’ after a news story – so you can jump from the news to a more detailed analysis. That might work.

Q15. Besides music, news, and weather, are there specific content areas in which people are more likely to run voice searches?

For now, most of the searches are ones that give simple responses, so ‘what’s the weather? How old is …, how tall is?.

There are very few complex queries such as around health or news. This I think is partly because these kind of responses (for example a health diagnosis) might require a longer audio dialogue or a screen that you can navigate to look at options. Health could be a good prospect for conversational diagnosis, certainly for someone who is disabled or partially sighted.

Q16. How far do you think we are from having a single voice assistant that could be compatible with Alexa, Google Home, Cortana, etc?

I have three voice assistants in my home (Alexa, Google and Siri) and I get very confused because the wake word is different and the way for asking for things is often different!

I suspect we will each default to one assistant who we know and who knows us in different contexts (the car, the home, the headphones). It should be possible in many environments to select your ‘usual assistant’ for example in a car or via Bose headphones. I use Google Assistant on my Apple iPhone even though it is harder to access than Siri because I can’t manage two different assistants

Q17. Did you find that publishers are supporting all smart speaker platforms (Google, Amazon and Apple), or are they making bets on one or two with the hope that they pick the winner?

Great question – In the UK and US some publishers are just looking at Amazon Alexa because it has 74% and 63% of the market, but Google is growing strongly so big players would certainly look at those two.

Screen based devices are marginal at the moment and so is the Apple HomePod, though Siri is huge on phones. In other countries though like Australia, Google is by far the biggest platform –very few people have Amazon devices –– so you need to know what the stats are and how they are moving in your country.

Q18. How do you see home devices shaping the way people interact and follow news?

The main change will be access to content. The technology in question solves a real problem because it’s just better at getting to things that you already know you want. They will impact anything that currently has a discovery problem, such as radio, podcasts, TV programs etc. They’re currently not very good at helping you discover what you don’t know. So for instance the idea of hammocking, or giving info about stuff they didn’t know they wanted, I don’t know how good it’ll be for that as the functionality is not yet developed.

Read the first major study of news consumption on voice-activated devices