FFmpeg] How to Merge Two Audio Files into a Stereo Track

- April 18, 2025

If you have two separate audio recordings — one for Speaker A and one for Speaker B — and both speakers were recorded at the same time, you might want to combine them into a single stereo audio file.

This is useful in many scenarios:

Dual-microphone recordings
Interview capture
Podcast editing
Voice separation testing
Audio analysis workflows

The key idea is to place one speaker’s voice in the left channel and the other speaker’s voice in the right channel. This way, both voices are preserved in a synchronized and clean format, perfect for further processing or listening.

You can use ffmpeg, a powerful open-source audio and video processing tool, to merge two mono audio files (e.g., interviews or podcasts) into a single stereo track.

FFmpeg Command

ffmpeg -i speakerA.wav -i speakerB.wav -filter_complex "[0:a][1:a]amerge=inputs=2[stereo]" -map "[stereo]" -ac 2 merged_stereo.wav

Options

Option	Description
`-i speakerA.wav`	Input file for the left channel
`-i speakerB.wav`	Input file for the right channel
`amerge=inputs=2`	Merge two mono streams into one stereo stream
`-ac 2`	Set the output to stereo (2 channels)
`merged_stereo.wav`	Output file with both speakers separated in left/right

Test Result


C:\Users\jason\Downloads>ffmpeg -i speakerA.wav -i speakerB.wav -filter_complex "[0:a][1:a]amerge=inputs=2[stereo]" -map "[stereo]" -ac 2 merged_stereo.wav

ffmpeg version 2022-07-04-git-dba7376d59-full_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12.1.0 (Rev2, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect ...
  libavutil      57. 27.100 / 57. 27.100
  libavcodec     59. 36.100 / 59. 36.100
  libavformat    59. 26.100 / 59. 26.100
  libavdevice    59.  6.100 / 59.  6.100
  libavfilter     8. 41.100 /  8. 41.100
  libswscale      6.  6.100 /  6.  6.100
  libswresample   4.  6.100 /  4.  6.100
  libpostproc    56.  5.100 / 56.  5.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'speakerA.wav':
  Duration: 00:07:37.86, bitrate: 256 kb/s
  Stream #0:0: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
Guessed Channel Layout for Input Stream #1.0 : mono
Input #1, wav, from 'speakerB.wav':
  Duration: 00:07:37.92, bitrate: 256 kb/s
  Stream #1:0: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
  Stream #0:0 (pcm_s16le) -> amerge
  Stream #1:0 (pcm_s16le) -> amerge
  amerge:default -> Stream #0:0 (pcm_s16le)
Press [q] to stop, [?] for help
[Parsed_amerge_0 @ 000002493b260bc0] No channel layout for input 1
[Parsed_amerge_0 @ 000002493b260bc0] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
Output #0, wav, to 'merged_stereo.wav':
  Metadata:
    ISFT            : Lavf59.26.100
  Stream #0:0: Audio: pcm_s16le, 16000 Hz, stereo, s16, 512 kb/s
    Metadata:
      encoder         : Lavc59.36.100 pcm_s16le
size=   28616kB time=00:07:37.86 bitrate= 512.0kbits/s speed=6.34e+03x
video:0kB audio:28616kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000266%

C:\Users\jason\Downloads>

Search This Blog

Software Engineer's Blog

Managing FastAPI Projects with Poetry: A Step-by-Step Guide