JBB

Verbiverse: Building an AI-Powered Text-to-Speech Platform

Created and launched Verbiverse, a SaaS platform providing AI-powered text-to-speech (TTS) conversion. Verbiverse leverages multiple leading cloud TTS engines (AWS Polly, Google Cloud, Azure, IBM Watson, OpenAI) to offer over 840 voices in 135+ languages. The platform features full SSML support for fine-grained control, flexible storage options, and a user-friendly interface. Custom prompts were also developed to aid TriadPrint customers with content creation, demonstrating a practical application for the technology.

Client: Verbiverse
Greensboro, NC

Project Overview:

This project involved the creation and deployment of Verbiverse (verbiverse.com), a Software as a Service (SaaS) platform that provides AI-powered text-to-speech (TTS) conversion. Leveraging multiple leading cloud-based TTS engines, including those from Amazon Web Services (AWS Polly), Google Cloud Text-to-Speech, Azure Cognitive Services, and IBM Watson Text to Speech, Verbiverse offers a wide range of voices, languages, and customization options. The platform was designed to be a versatile tool for various applications, including voiceover work, audiobook creation, podcasting, marketing content generation, and educational materials. I also created all branding and logos for the platform.

Challenge:

The challenge was to create a user-friendly and powerful text-to-speech platform that offered:

  • High-Quality Voices: Natural-sounding, expressive voices across multiple languages, surpassing the quality of basic, robotic-sounding TTS solutions.
  • Flexibility and Control: Options for customizing voice output (speed, pitch, emphasis, pronunciation) using SSML (Speech Synthesis Markup Language) to achieve professional-grade results.
  • Scalability: The ability to handle a large volume of text and generate audio files quickly and reliably.
  • Multiple Storage Options: Flexibility in where generated audio files are stored (local server, Amazon S3, Wasabi).
  • Affordable Pricing: A sustainable business model for a SaaS platform, making it accessible to a wide range of users.
  • Ease of Use: An intuitive user interface that requires no technical expertise.
Desktop Device

https://verbiverse.com/

Solution:

The solution was Verbiverse, a web-based platform built to provide high-quality, customizable, and scalable text-to-speech conversion, with a focus on user experience and versatility.

The platform offers AI-powered text generation, allowing users to create a wide variety of written content. This includes marketing copy for websites, social media posts, email newsletters, and even print materials like flyers and brochures. Users can also generate blog posts, articles, product descriptions, and even outlines for business plans. This versatility is achieved through the integration of various OpenAI models, including GPT-3 and GPT-4, providing flexibility for diverse content needs.

Beyond text, the platform incorporates AI-powered image generation. Users can create unique visuals by simply providing text descriptions, leveraging the capabilities of DALL-E 2, DALL-E 3, and Stable Diffusion.

For users working with audio or video, the platform offers speech-to-text transcription using OpenAI’s Whisper model. This allows for quick and easy conversion of audio and video recordings into written text. Conversely, text-to-speech functionality, powered by AWS, enables users to create audio files from written text.

An AI Chat feature provides an interactive way for users to brainstorm ideas, refine content, or get answers to their questions. This feature utilizes multiple AI chatbots, each with different strengths and “personalities,” allowing for a more dynamic and engaging experience.

Crucially, the platform was not just a generic AI tool; it was customized with specific prompts and templates designed to address the common needs of TriadPrint customers. Pre-built prompts were developed for creating business model canvases, outlining phased website development plans, and generating marketing materials specifically tailored to print applications. This strategic prompt engineering ensured that the AI-generated content was relevant, useful, and aligned with the types of projects TriadPrint customers were likely to undertake.

Core Technology and Integrations:

Verbiverse leverages the power of multiple leading cloud-based text-to-speech engines:

  • Amazon Polly (AWS): Providing access to a wide range of standard and neural voices, known for their naturalness and expressiveness.
  • Google Cloud Text-to-Speech: Offering a diverse selection of voices and languages, with advanced customization options.
  • Azure Cognitive Services: Providing high-quality, customizable voices with fine-grained control over speech attributes.
  • IBM Watson Text to Speech: Offering a robust and reliable TTS engine with a focus on business applications.
  • OpenAI

This multi-engine approach ensures a broad selection of voices (over 840), languages (over 135), and styles to meet diverse user needs. It also provides redundancy and ensures the platform remains operational even if one provider experiences an outage.

Key Features:

  • Extensive Voice Library: Access to over 840 voices across more than 135 languages and dialects.
  • SSML Support: Full support for Speech Synthesis Markup Language (SSML), allowing users to fine-tune voice output with precise control over pronunciation, intonation, emphasis, pauses, and other speech characteristics. This caters to professional users who demand a high degree of control.
  • Multiple Storage Options: Users can choose to store generated audio files locally on the server or utilize cloud storage solutions like Amazon S3 or Wasabi for scalability, accessibility, and data security.
  • User-Friendly Interface: The platform features an intuitive interface that makes it easy to enter text, select voices, adjust settings, and generate audio files, even for users with no technical expertise.
  • Subscription-Based Model: Verbiverse operates as a SaaS platform, offering various subscription plans to cater to different usage levels and budgets.
  • Free Tier: A free tier allows users to experience the platform’s capabilities before committing to a paid subscription.
  • Affiliate Program: An affiliate program incentivizes users to promote Verbiverse and expand its reach.
  • Customizable Prompts
  • Prompt Engineering

Results and Reflection:

Verbiverse successfully demonstrates the ability to create a functional, user-friendly, and commercially viable SaaS platform leveraging cutting-edge AI technology. The project highlights expertise in web development, API integration (with multiple cloud providers), user interface design, and a deep understanding of the text-to-speech market and its applications. The platform’s flexibility, extensive voice library, SSML support, and multiple storage options make it a powerful tool for a wide range of users, from individual content creators to large businesses. The ongoing development and refinement of custom prompts, and integration for TriadPrint’s customers, demonstrate a commitment to providing tailored solutions for specific user needs.