audio-inbox
Turn a raw voice memo into a clean, concise inbox note with provenance preserved. Orchestrates transcribe-memo for the speech-to-text step, then cleans the result into a structured note.
TEMPORARILY SUSPENDED (2026-04-22)
Audio transcription via transcribe-memo requires ffmpeg and network access to OpenRouter. Claude Code cloud runners currently lack Git LFS support, so audio files in 0_Inbox/Memos/ may be LFS pointers rather than real audio data. Skip all files in 0_Inbox/Memos/ until this is resolved. A workaround (e.g. providing transcripts directly, local transcription, or LFS support) is pending.
When this blocker is lifted, remove this section and the skill works as documented below.
When to use
- A new audio file appears in
0_Inbox/Memos/(any format: M4A, AIFF, MP3, WAV, etc.). - The user asks to “clean up this voice memo” or “process this memo”.
- The user runs
/audio-inboxexplicitly. - Invoked by
process-inboxin its Phase 0.
Workflow
Phase 0 — Transcribe
- For each audio file in
0_Inbox/Memos/(any format — M4A, AIFF, MP3, WAV, etc.), invoke thetranscribe-memoskill.transcribe-memoconverts to WAV, sends to OpenRouter (google/gemini-2.5-flash), saves the raw transcript to6_Private/Memo/transcripts/<filename>.md, and deletes the audio file.- If
transcribe-memofails (no API key, API error), it appends a notification to7_Agent/notifications.mdand preserves the audio file. Stop processing that file — move to the next one, or stop if there are no more.
Phase 1 — Clean and structure
For each successfully transcribed file (i.e., a transcript now exists at 6_Private/Memo/transcripts/<filename>.md):
- Read the raw transcript from
6_Private/Memo/transcripts/<filename>.md. - Detect language —
de(German),en(English),ja(Japanese), ormixed. If mixed, use the dominant language for the concise version; keep the original in the callout. - Produce three outputs (all in the source language):
- Cleaned transcript — remove filler words, fix spelling / transcription mistakes (e.g. wrong words). Preserve sentence structure and voice.
- Concise version — the note as if written by hand instead of rambled into a voice memo. First-person throughout — never third-person (“The user discussed…”). No “Summary:” or “Overview:” prefixes.
- Filename — content-bearing, no extension. No type words (“summary”, “note”, “transcript”, “memo”) — they’re unnecessary.
Example (given a raw transcript):
Today I went to the park and I was like, um, thinking about how smartphones are like, bad for my brain; they really seem ADHD inducing! Well, from my perspective I should stop using mine so much I guess?
Cleaned transcript:
Today I went to the park and thought about how smartphones are bad for my brain; they really seem ADHD inducing! Well, from my perspective I should stop using mine so much I guess?
Concise version:
I went to the park and thought about how smartphones are bad for the brain and ADHD inducing. I should probably stop using mine as much.
Filename:
Park walk - Smartphones and ADHD
-
Write the note to
0_Inbox/<filename>.md(root of inbox, not inMemos/). Structure:--- created: YYYY-MM-DD lang: en tags: [voice-memo] source: voice-memo transcript: 6_Private/Memo/transcripts/<original-filename>.md --- <concise version as the body> > [!note]- Full transcript > <cleaned transcript, preserving paragraph breaks> -
The audio file is already deleted by
transcribe-memo. The raw transcript persists in6_Private/Memo/transcripts/as the permanent record.
Rules
- Do all three outputs in the source language. If the transcript is German, the concise version is German, the filename is German.
- First person throughout. “I went to the park” — not “The user went to the park” or “The speaker discussed”.
- No
summary/overview/notes on/memoin the filename. Content itself names the note. - Do not route. Write to
0_Inbox/root.process-inboxhandles PARA placement. - Do not touch other pages. This skill is narrow. Cross-references happen in
process-inbox. - Preserve furigana syntax if Japanese content already has it (see obsidian-markdown).
- The
transcript:frontmatter field links back to the raw transcript in6_Private/. This is the provenance chain: inbox note → raw transcript (in 6_Private/) → original audio (deleted, referenced in transcript frontmatter).
Blocked cases
- Transcription failed —
transcribe-memohandles notification. Skip the file. - Transcript is empty or unintelligible — leave the raw transcript in
6_Private/Memo/transcripts/, create a file0_Inbox/_unprocessable-<timestamp>.mdwith a one-line note, so the user can decide. - Transcript references a specific page that you can’t identify — still produce the note; cross-referencing is
process-inbox’s job.
Verification
After running:
0_Inbox/<new-filename>.mdexists with schema-conformant frontmatter and the callout.0_Inbox/Memos/has no audio files that were successfully transcribed (they’re deleted bytranscribe-memo).6_Private/Memo/transcripts/<filename>.mdexists for each processed memo.- The body reads as written-by-hand prose, not a rambled monologue.