Voicy: An App Expanding Voice Streaming Culture in Japan

Voicy, an early player in the Japanese voice platform scene, underwent a home screen redesign in November 2023. What journey has it taken to establish audio culture in Japan?

Miho Kyoya | Voicy Inc. VUX/VUI Designer

After starting her career as an in-house designer at a major corporation, Miho joined Voicy in 2018 as the first employee and designer. She has been involved in various tasks, from product development in the early days to organizational building. Currently, she leads product-wide UX/UI design.

Voicy’s mission to create voice streaming culture in Japan

── First, could you tell us about Voicy?

Miho: Voicy is an audio platform where users can listen to curated content while multitasking. Launched in 2016, it has surpassed 2 million registered members. With over 2,000 channels featuring personalities who have passed a rigorous 5% approval rate screening, Voicy is also utilized by companies as owned media.

Voice Platform “Voicy

── It’s been seven years since the service launched. What changes have occurred during this time?

Miho: In the early days, we heard many comments saying “Wouldn’t a video service be better?” Over the last seven years, the rise of smart speakers like Google Home and the emergence of Clubhouse (a voice-based social networking app developed by the US company Alpha Exploration) have increased interest in audio content. We’ve noticed a gradual shift as more people recognize the appeal of voice content.

However, we still see the spread of audio in Japan primarily among early adopters. That’s why we strongly emphasize the idea of creating culture ourselves.

── Are there differences in audio culture between Japan and overseas?

Miho: In countries like the US, there has always been a culture of listening to audio content, with significant budgets allocated for content creation. Due to its vast land and regions, there is more chance of listening to storytelling audio tapes while driving long distances in the US. I heard many listen to suspense dramas through audio streaming these days.

The characteristics and habits of each country or region influence product design and content creation. For example, at Voicy, we design chapters to be a maximum of 10 minutes, considering it suitable for listening during train commutes in urban Japan.

Respecting listeners’ lifestyles and quietly adapting to every situation

── In terms of content experience, where do you believe the unique features of audio lie?

Miho: One aspect is its high engagement level. In visual media, attention is often divided among various elements like facial expressions, clothing, and backgrounds, but with audio, only the speaker’s words come through directly. The simplicity allows the individuality of the speaker to shine through more vividly, leading to higher engagement. Audio is not merely a visual image without a picture.

Additionally, the feature of being able to listen while multitasking makes it easy to incorporate into daily routines, with an average listening retention rate of about 80%. The key is the ability to convert idle time into meaningful moments without disrupting activities like house chores or walks, which has led listeners to feel that they are gaining more valuable time.

This is something we value as a life-fit media, aiming to support listeners’ lives by providing information and entertainment seamlessly into their daily routines. While we anticipate behavioral changes such as finding new interests through listening to content or deciding to change careers, we’re not aiming to alter listeners’ lifestyles through our content drastically.

── It seems like Voicy is envisioned to complement listeners’ current lifestyles without disrupting them. What initiatives do you take to understand listeners’ lives and their changes?

Miho: We conduct interviews regularly, not only to analyze data but to capture subtle changes in listeners’ daily lives. By designing questions that allow us to sense these small shifts without a specific goal in mind, we can quickly identify any discrepancies between our perception of “this is how listeners live” and reality.

Understanding listeners’ lifestyles is crucial, and when developing new features or improvements, we make sure to delve into the specific situations in which these features are needed. We often engage in discussions detailing scenarios like “improvements for listening while driving,” considering how and where the product will be used in daily life.

If we receive feedback that something is difficult to use, we investigate the reason behind it. We prioritize thinking with real-life situations, like where the buttons should be to make it easier to handle while driving and address user needs effectively.

Sharing interview feedback with the team

Is the lack of audio virality a benefit or a drawback?

── You mentioned earlier that platforms like Clubhouse gained popularity in 2021. How did Voicy view these audio boom trends?

Miho: Considering the current trend where content creators play a significant role in content selection, I don’t think there is widespread interest in audio itself yet, so I hesitate to call it a boom. I believe it becomes a boom when people say, “Let’s search in audio for now.”

However, the impact of Clubhouse was significant. The rise of Clubhouse made more people realize that listening to others’ conversations can be intriguing, and we drew inspiration from the live experience uniqueness of Clubhouse. While Voicy differs from Clubhouse as a stock-type platform, we were inspired by the appeal of experiences that urge listeners to think, I must listen now, leading to the development of our live broadcasting feature.

In the live broadcasting feature, listeners can tune in to broadcasts streamed in real-time by personalities.

── Rather than viewing it as competition, you saw it as an opportunity for market expansion, correct?

Miho: Honestly, audio content, unlike illustrations or videos, doesn’t elicit immediate reactions, so it won’t go viral just by being there. This has both positive and negative aspects. On the positive side, content is less likely to be distorted or taken out of context and spread widely. The need to listen attentively creates a kind environment where such distortions are less likely to occur. The downside is that it’s challenging to achieve the necessary attention to reach a larger audience.

We have been discussing how to resolve this dilemma through our product but are still exploring solutions. We experimented with allowing users to listen to summaries in 30 seconds, but we have yet to find the best idea. Clubhouse’s popularity has been a significant hint in this regard, and we aim to learn from it and apply those lessons in the future.

The necessity of an approval system

── What kind of platform image do you idealize for the future?

Miho: Our goal is to become the voice version of YouTube. YouTube offers a wide range of videos across various genres, from educational to entertaining content. When you want to watch something, YouTube is usually the first platform that comes to mind. Similarly, Voicy aims to become the go-to place for all genres of audio content, ensuring that whenever someone wants to listen to something, they think of us first.

── Why is there an approval system for personalities, even as you aim to become the audio version of YouTube?

Miho: In the early days of Voicy, when Japan lacked a strong culture of listening to audio content, there was a concern that an overflow of various content could lead to a negative first experience for users. We faced many challenges in creating the recommended and matching mechanisms necessary for the ideal user experience, as the technology was still developing and not yet sufficient.

Therefore, we believe that if we want to build a culture in the long run, we should first curate and ensure the quality ourselves of the platform. That’s why we implemented the approval system.

── You consider personalities crucial for expanding in a positive direction. How do you engage with them?

Miho: Voicy has always prioritized a personality-first approach since the beginning. In a time when audio streaming was not yet common, asking someone to speak alone on a smartphone was quite challenging. We started by providing support in creating talk themes and planning, and now we also share expertise and success stories. We have teams dedicated to supporting personalities for success, and we often meet them directly for consultations.

At the Voicy Festival held in 2023, the Personality Success team values communication with personalities.

Miho: Currently, we are incorporating support for personalities as a product feature based on their content. We are exploring ways to translate successful strategies into technology, working closely with them to create the product.

── What innovations have you implemented for personalities in existing features?

Miho: We recently revamped the analytics for personalities, incorporating advice given by the Personality Success team. We aimed to present the data not just as numbers but to provide explanations and advice on how to interpret the data and make adjustments based on it.

Steering towards change in pursuit of the ideal vision

── What do you prioritize in building voice streaming culture in Japan as you aim to become the voice version of YouTube?

Miho: While we respect the current lifestyle of listeners, it is essential to incorporate suggestions from the operational side based on insights received from listeners. Our initiatives and challenges are sometimes necessary, as Japan’s audio market is still immature.

── Can you share any recent initiatives that embody these principles?

Miho: One example would be the introduction of voice dramas in the summer of 2023. Voicy was previously seen as a place for learning and information, so the need for content like light novels didn’t naturally arise from listeners.

Voice dramas started in 2023 feature a lineup of various works.

Miho: Insights from interviews with those who had left Voicy provided hints. Some mentioned they couldn’t listen to Voicy when they were mentally exhausted because informational content was abundant. This led us to consider the potential of entertainment-oriented content like voice dramas for such audiences.

── How was the response to this initiative?

Miho: We received mixed reactions. Some felt it deviated from the traditional Voicy service, while others found it intriguing as a new endeavor. 

In our pursuit of becoming the audio version of YouTube, it is crucial to create a platform where those who want to listen can easily find the content they enjoy, while those uninterested find it less accessible. We are updating our product design to realize this vision, incorporating the feedback and learnings from the voice dramas.

In November 2023, one of the updates was the home screen, where we finally enhanced the matching and recommendation features. By utilizing a large-scale language model (a language model constructed using a massive dataset and deep learning technology) to analyze audio data, we identified key elements for each content and combined them with each listener’s playback history to appeal to their interests. This allowed us to set up a system that creates encounters with suitable content.

Updates of the home screen in November 2023.

Miho: Leveraging the unique characteristic of conveying one’s personality through voice, we have been focusing on enhancing the product design to cater to a diverse range of content. By utilizing AI, we aim to make it easier for listeners to discover content based on their interests.

── There are many challenging aspects, such as breaking away from the existing image and adjusting the product design. What are your thoughts on this?

Miho: I agree. While it may not be in the immediate future, we believe rebranding will be necessary. As we continue to accept a variety of content, we also recognize the need to tone down Voicy’s branding to make the content stand out. We are gradually adjusting the overall UI design, starting with color adjustments.

── I feel that Voicy’s signature orange color used to be more prominent in the past. Do you have any thoughts on this?

Miho: You are correct. At that time, Voicy was mainly used by business influencers as a “voice blog.” Some individuals perceived those who communicated through text as cold. Listening to someone’s voice on Voicy provided a sense of warmth and familiarity, which we considered essential. The color orange symbolized this warmth, signifying the value of experiencing voices that one hadn’t heard before on Voicy.

While this aspect still holds, more people are now starting to listen to Voicy without first seeing the text-based content. They are beginning to recognize the diverse values of audio beyond just warmth. As Voicy evolves and emphasizes content diversity, we believe it will become more neutral. We aim to continue evolving towards our ideal vision, always adapting to create and nurture audio culture.

Related Links
Voicy, Inc.

Voicy

Written By

Shiho Nagashima

Shiho is an editor at Spectrum Tokyo. She has been a freelancer since 2022 after working at a movie company, an advertising agency, and a startup. She supports creators to make the most of their characteristics, while she herself is involved in a wide range of content creation.

Nanako Tsukamoto

Nanako is an editor for the English version of Spectrum Tokyo. After spending ten years in the US and graduating from Sophia University, she worked in finance for six years. She loves planning train trips with her 4-year-old son, an avid train enthusiast.

Partners

Thanks for supporting Spectrum Tokyo ❤️

fest partner GMO fest partner note,inc.
fest partner DMM.com LLC fest partner Gaudiy, Inc.
fest partner Cybozu fest partner Bitkey
partners LegalOn Technologies fest partner SmartHR
fest partner Morisawa partners Design Matters

Spectrum Tokyoとの協業、協賛などはお問い合わせまで