How it works
Speech to Text — Live transcribe speech to text using your microphone. All processing happens in your browser — no upload, no signup, no email required. Free forever.
Last updated:
About Speech to Text
Speech to Text live-transcribes whatever you say through your microphone into text on the page. It uses the browser's Web Speech Recognition API, making it a zero-install dictation tool perfect for quick notes, drafting messages or capturing thoughts during a meeting.
Journalists capture interview snippets, students dictate study notes, and developers prototype voice features without wiring up a backend. The recognition runs continuously until you click Stop, building up a transcript you can copy or edit.
Browser support is uneven. The recognition works best on Chrome and Edge; most Chromium browsers (Brave, Opera) work too. Safari has partial support on macOS, and Firefox does not currently expose the API.
How to use Speech to Text
- Pick a Language from the dropdown so the recogniser uses the correct acoustic model.
- Click Start listening. Your browser will prompt for microphone permission — accept it once.
- Speak naturally. The Listening… status appears while the mic is active and the transcript fills in as words are recognised.
- Click Stop when you're finished. The transcript stays on screen so you can edit or copy it.
- If recognition pauses unexpectedly, click Start listening again to resume.
Common use cases
- Capturing voice memos during a walk and turning them into editable text.
- Drafting blog posts or emails by talking instead of typing — much faster for first drafts.
- Creating rough transcripts of interviews or podcasts for searchable archives.
- Accessibility: hands-free input for users who find typing difficult or painful.
- Practising a foreign language and verifying that your pronunciation is recognised correctly.
Tips & common mistakes
- Use a quiet environment and a decent microphone. Background noise tanks accuracy.
- Speak in short, complete sentences — the recogniser inserts punctuation based on pauses and intonation.
- If the API stops listening after a few seconds of silence, just click Start again. We resume from where you left off.
- Pick the right language variant (e.g. en-US vs en-GB). The model is tuned per locale and accents matter.
Frequently asked questions
Which browsers support this?
Chrome, Edge and most Chromium-based browsers (Brave, Opera). Safari supports it on macOS but with limitations. Firefox does not currently expose the Web Speech API.
Where does recognition happen?
Browser implementations typically send audio to a cloud service (e.g. Google's) for transcription. The transcript comes back to your browser; we never see it. If you need fully on-device, look for tools using whisper.cpp.
Can it transcribe multiple speakers or noisy audio?
Web Speech API is optimised for a single speaker in a quiet environment. For multi-speaker diarisation or messy audio, dedicated tools work much better.
Does the audio go to a server?
Browser implementations of the Web Speech API typically send audio to the vendor's cloud service (Google for Chrome, Microsoft for Edge) for recognition. We never see the audio or the transcript — but the browser maker does. For fully on-device recognition, look at whisper.cpp-based tools.
Why is recognition so much worse on Safari than on Chrome?
Safari implements an older subset of the Speech Recognition spec and uses Apple's on-device model, which is optimised for Siri commands rather than open-vocabulary dictation. Try Chrome or Edge for noticeably better accuracy.
Can I add custom vocabulary like product names?
The Web Speech API does not expose a custom vocabulary hook. Common workarounds: spell unusual names out the first time, or do a find-and-replace on the transcript afterwards.