audio-inbox

Turn a raw voice memo into a clean, concise inbox note with provenance preserved. Orchestrates transcribe-memo for the speech-to-text step, then cleans the result into a structured note.

TEMPORARILY SUSPENDED (2026-04-22)

Audio transcription via transcribe-memo requires ffmpeg and network access to OpenRouter. Claude Code cloud runners currently lack Git LFS support, so audio files in 0_Inbox/Memos/ may be LFS pointers rather than real audio data. Skip all files in 0_Inbox/Memos/ until this is resolved. A workaround (e.g. providing transcripts directly, local transcription, or LFS support) is pending.

When this blocker is lifted, remove this section and the skill works as documented below.

When to use

A new audio file appears in 0_Inbox/Memos/ (any format: M4A, AIFF, MP3, WAV, etc.).
The user asks to “clean up this voice memo” or “process this memo”.
The user runs /audio-inbox explicitly.
Invoked by process-inbox in its Phase 0.

Workflow

Phase 0 — Transcribe

For each audio file in 0_Inbox/Memos/ (any format — M4A, AIFF, MP3, WAV, etc.), invoke the transcribe-memo skill.
- transcribe-memo converts to WAV, sends to OpenRouter (google/gemini-2.5-flash), saves the raw transcript to 6_Private/Memo/transcripts/<filename>.md, and deletes the audio file.
- If transcribe-memo fails (no API key, API error), it appends a notification to 7_Agent/notifications.md and preserves the audio file. Stop processing that file — move to the next one, or stop if there are no more.

Phase 1 — Clean and structure

For each successfully transcribed file (i.e., a transcript now exists at 6_Private/Memo/transcripts/<filename>.md):

Read the raw transcript from 6_Private/Memo/transcripts/<filename>.md.
Detect language — de (German), en (English), ja (Japanese), or mixed. If mixed, use the dominant language for the concise version; keep the original in the callout.
Produce three outputs (all in the source language):
- Cleaned transcript — remove filler words, fix spelling / transcription mistakes (e.g. wrong words). Preserve sentence structure and voice.
- Concise version — the note as if written by hand instead of rambled into a voice memo. First-person throughout — never third-person (“The user discussed…”). No “Summary:” or “Overview:” prefixes.
- Filename — content-bearing, no extension. No type words (“summary”, “note”, “transcript”, “memo”) — they’re unnecessary.

Example (given a raw transcript):

Today I went to the park and I was like, um, thinking about how smartphones are like, bad for my brain; they really seem ADHD inducing! Well, from my perspective I should stop using mine so much I guess?

Cleaned transcript:

Today I went to the park and thought about how smartphones are bad for my brain; they really seem ADHD inducing! Well, from my perspective I should stop using mine so much I guess?

Concise version:

I went to the park and thought about how smartphones are bad for the brain and ADHD inducing. I should probably stop using mine as much.

Filename:

Park walk - Smartphones and ADHD

Write the note to 0_Inbox/<filename>.md (root of inbox, not in Memos/). Structure:

---
created: YYYY-MM-DD
lang: en
tags: [voice-memo]
source: voice-memo
transcript: 6_Private/Memo/transcripts/<original-filename>.md
---
 
<concise version as the body>
 
> [!note]- Full transcript
> <cleaned transcript, preserving paragraph breaks>

The audio file is already deleted by transcribe-memo. The raw transcript persists in 6_Private/Memo/transcripts/ as the permanent record.

Rules

Do all three outputs in the source language. If the transcript is German, the concise version is German, the filename is German.
First person throughout. “I went to the park” — not “The user went to the park” or “The speaker discussed”.
No summary / overview / notes on / memo in the filename. Content itself names the note.
Do not route. Write to 0_Inbox/ root. process-inbox handles PARA placement.
Do not touch other pages. This skill is narrow. Cross-references happen in process-inbox.
Preserve furigana syntax if Japanese content already has it (see obsidian-markdown).
The transcript: frontmatter field links back to the raw transcript in 6_Private/. This is the provenance chain: inbox note → raw transcript (in 6_Private/) → original audio (deleted, referenced in transcript frontmatter).

Blocked cases

Transcription failed — transcribe-memo handles notification. Skip the file.
Transcript is empty or unintelligible — leave the raw transcript in 6_Private/Memo/transcripts/, create a file 0_Inbox/_unprocessable-<timestamp>.md with a one-line note, so the user can decide.
Transcript references a specific page that you can’t identify — still produce the note; cross-referencing is process-inbox’s job.

Verification

After running:

0_Inbox/<new-filename>.md exists with schema-conformant frontmatter and the callout.
0_Inbox/Memos/ has no audio files that were successfully transcribed (they’re deleted by transcribe-memo).
6_Private/Memo/transcripts/<filename>.md exists for each processed memo.
The body reads as written-by-hand prose, not a rambled monologue.

Alex' Gardenアレックスの庭

Explorer