Question 1

Which browsers support this?

Accepted Answer

Chrome, Edge and most Chromium-based browsers (Brave, Opera). Safari supports it on macOS but with limitations. Firefox does not currently expose the Web Speech API.

Question 2

Where does recognition happen?

Accepted Answer

Browser implementations typically send audio to a cloud service (e.g. Google's) for transcription. The transcript comes back to your browser; we never see it. If you need fully on-device, look for tools using whisper.cpp.

Question 3

Can it transcribe multiple speakers or noisy audio?

Accepted Answer

Web Speech API is optimised for a single speaker in a quiet environment. For multi-speaker diarisation or messy audio, dedicated tools work much better.

Question 4

Does the audio go to a server?

Accepted Answer

Browser implementations of the Web Speech API typically send audio to the vendor's cloud service (Google for Chrome, Microsoft for Edge) for recognition. We never see the audio or the transcript — but the browser maker does. For fully on-device recognition, look at whisper.cpp-based tools.

Question 5

Why is recognition so much worse on Safari than on Chrome?

Accepted Answer

Safari implements an older subset of the Speech Recognition spec and uses Apple's on-device model, which is optimised for Siri commands rather than open-vocabulary dictation. Try Chrome or Edge for noticeably better accuracy.

Question 6

Can I add custom vocabulary like product names?

Accepted Answer

The Web Speech API does not expose a custom vocabulary hook. Common workarounds: spell unusual names out the first time, or do a find-and-replace on the transcript afterwards.

Speech to Text

How it works

About Speech to Text

How to use Speech to Text

Common use cases

Tips & common mistakes

Frequently asked questions

Related tools