Mati Staniszewski, co-founder and CEO of ElevenLabs, has asserted that voice is emerging as the next significant interface for artificial intelligence, enabling more natural interaction between people and machines as models evolve beyond text and screens.
During his remarks at the Web Summit in Doha, Staniszewski emphasized to TechCrunch that voice models, such as those created by ElevenLabs, have progressed from merely imitating human speech—including emotion and intonation—to integrating seamlessly with the reasoning capabilities of large language models. This advancement, he noted, marks a pivotal shift in how individuals engage with technology.
Looking ahead, Staniszewski expressed hope that “all our phones will return to our pockets, allowing us to fully engage with the real world around us, with voice serving as the means to control technology.”
This vision has fueled ElevenLabs’s recent $500 million funding round, raising its valuation to $11 billion, and is increasingly resonating throughout the AI sector. Notably, both OpenAI and Google are placing a strong emphasis on voice technology in their next-generation models, while Apple is quietly enhancing voice-related capabilities through acquisitions, including Q.ai. As AI continues to permeate wearables, automobiles, and other emerging technologies, the interaction paradigm is shifting from screen taps to vocal commands, positioning voice as a crucial arena for the next evolution of AI.
Seth Pierrepont, a general partner at Iconiq Capital, reinforced this perspective at the Web Summit, asserting that although screens will remain relevant for gaming and entertainment, conventional input methods like keyboards are starting to feel “outdated.”
Furthermore, Pierrepont indicated that as AI systems become more autonomous, the nature of user interactions will transform. These models will become more equipped with contextual understanding and built-in frameworks, enabling them to respond with less explicit prompting from users.
Staniszewski identified this shift towards agency as one of the most significant transformations currently in progress. He suggested that future voice systems will increasingly depend on persistent memory and contextual knowledge developed over time, making interactions feel more intuitive and requiring less effort from users.
TechCrunch Event
Boston, MA
|
June 23, 2026
This evolution, he noted, will shape the deployment of voice models. While high-quality audio models have predominantly resided in the cloud, Staniszewski indicated that ElevenLabs is pursuing a hybrid strategy that merges cloud functionality with on-device processing. This approach aims to accommodate new hardware, including headphones and other wearables, transforming voice into a continuous companion rather than a feature users must consciously activate.
ElevenLabs is already collaborating with Meta to incorporate its voice technology into various products, such as Instagram and the Horizon Worlds virtual-reality platform. Staniszewski also expressed interest in partnering with Meta on its Ray-Ban smart glasses as voice-driven interfaces evolve into new formats.
However, as voice technology becomes more pervasive and integrated into everyday devices, significant concerns about privacy, surveillance, and the management of personal data arise—issues that companies like Google have already faced scrutiny over.
