PalAI

Voice AI Assistant

Meet PalAI

We've trained a model called PalAI which interacts in a conversational way. The dialogue format makes it possible for PalAI to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.

Natural Voice Interaction

Speak naturally with PalAI using advanced speech recognition and synthesis.

Contextual Memory

PalAI remembers previous parts of your conversation for more meaningful interactions.

Emotion Detection

Understands the emotional tone of your voice to respond with appropriate empathy.

Methods

PalAI is built using a modular architecture that combines state-of-the-art open source models. We use OpenAI Whisper for robust speech recognition, capable of understanding multiple languages and accents. For audio processing, we utilize librosa to analyze spectral features and detect voice activity.

The core intelligence is powered by Ollama running Llama 3, fine-tuned for conversational dynamics. Text-to-speech synthesis is handled by KittenTTS (and optionally EmotiVoice), providing low-latency, natural-sounding voice responses.

Real-time communication is managed via Socket.IO, ensuring instant feedback loops between the user and the AI agent.

Limitations

  • Background Noise: While we use noise suppression, heavy background noise may still impact transcription accuracy.
  • Language Support: Currently optimized for English. Other languages may have variable performance.
  • Emotion Accuracy: Emotion detection is experimental and may occasionally misinterpret tone, especially with short utterances.
  • Latency: Response times depend on server load and network connection. We aim for sub-second latency but it can vary.
  • Nuance: The system may occasionally produce "hallucinations" or incorrect information, similar to other LLMs.
  • Emotional Range: The TTS engine has a limited range of emotional expressiveness, defaulting to "neutral" when the emotional signal is weak. We're working on making the system more sensitive to nuanced emotional expressions.

Say something to start the conversation

😐
Neutral
Confidence: 0%