SignSpeak AI

Our AI Models

Learn about the two machine learning models powering Sign-Speak AI.

Hand Landmark Recognition Model (Random Forest)

Goal: Recognize static or simple gestures using 3D hand landmark coordinates.

How It Works:

•
Feature Extraction: Uses MediaPipe Hands to detect 21 landmarks per hand (x, y, z → 63 features per hand).
•
Data Representation: Landmarks are stored in a CSV file; each row represents one gesture sample with its label (e.g., "A", "B", "1").
•
Model Training: Trains a Random Forest classifier (200 trees) on the landmark data to learn geometric patterns.
•
Recognition: Captures real-time landmarks, scales features, and predicts the gesture using the trained model with temporal smoothing for stability.

Key Strengths:

•
Lightweight & Fast: Runs on CPU.
•
High Accuracy: 90–97% for static gestures
•
Efficient: Low storage & quick retraining

How Sign-Speak AI Works

Our architecture combines real-time computer vision with our AI models to provide South African Sign Language translation to both text and speech output formats.

1

Camera Capture & WebSocket Transmission

Camera captures real-time video frames which are sent via WebSocket to the Flask server for gesture detection and analysis.

WebRTC WebSocket

2

Flask AI Model Detection

Image frames are received by the Flask server where one of our AI models (depending on chosen user option) processes the frame sequences in order to detect South African Sign Language.

Flask PyTorch CNN-LSTM

3

Grammar Correction & JSON Response

Flask server applies grammar correction to detected phrase list and sends structured JSON data back through WebSocket to Express.js server.

Grammar correction JSON WebSocket

4

Text Display & Audio Playback

Express.js server displays detected text to user interface. When user clicks play, pyttsx3 generates and streams audio playback of the phrase.

Express.js pyttsx3 User Control

Data Flow Architecture

Camera Capture

WebRTC Video Stream

WebSocket

AI Model Processing

Using one of our AI Models

Grammar Check

Text Correction

JSON Response

Web Server

Express.js Backend

Text Displayed

On User Interface

User Clicks

Audio Output

using pyttsx3 TTS

Technology Stack

Python + Flask

AI model that processes data on the server using PyTorch

MediaPipe

That applies 21-point hand landmark detection

WebRTC + Canvas

That enables camera access and rendering

Contributors

AI Model

Vaughn du Preez

Created, designed and managed AI model

Joshua Clinton

Video capturing for data model training

Zac Myburgh

Created grammar utility used for grammar correction

Back End / Hosting

Zoë Janse van Rensburg

Created and managed the back-end infrastructure.

Willem Booysen

Server hosting and management. Backend logic implementation of JWT tokens and cookies.

Front End

Waldo Blom

Integration of backend components with front-end interfaces (Websocket creation, TTS and camera). Designed UI of application and landing page. Created camera page.

Willem Abraham Jacobus Kruger

Created settings page.

Zanthus Van deventer

Created user storage phrases page.

Research

Joshua Clinton

Advisor for anything regarding South African Sign language

Vunene Khoza

Project research

Yandile Ngubane

Project research

Yanga Mazibuko

Project research

Zoë Janse van Rensburg

Project research

Zanthus Van deventer

Project research and project manager

* Please Note: This is a summary of key contributions, not a comprehensive list of contributions. Different departments collaborated on various aspects and communicated throughout the entire development of the application.

Sign-Speak AI

Real-Time SASL Translation

Our AI Models

Hand Landmark Recognition Model (Random Forest)

How It Works:

Key Strengths:

Video-Based CNN + LSTM Model (Deep Learning)

How It Works:

How Sign-Speak AI Works

Camera Capture & WebSocket Transmission

Flask AI Model Detection

Grammar Correction & JSON Response

Text Display & Audio Playback

Data Flow Architecture

Camera Capture

AI Model Processing

Grammar Check

Web Server

Text Displayed

Audio Output

Ready to try our application in real time?

Technology Stack

Node.js + Express

Python + Flask

PyTorch

MediaPipe

MongoDB

WebSocket

WebRTC + Canvas

pyttsx3 TTS

Contributors

AI Model

Back End / Hosting

Front End

Research