Writing media files

Mediabunny enables you to create media files with very fine levels of control. You can add an arbitrary number of video, audio and subtitle tracks to a media file, and precisely control the timing of media data. This library supports many output file formats. Using output targets, you can decide if you want to build up the entire file in memory or stream it out in chunks as it's being created - allowing you to create very large files.

Mediabunny provides many ways to supply media data for output tracks, nicely integrating with the WebCodecs API, but also allowing you to use your own encoding stack if you wish. These media sources come in multiple levels of abstraction, enabling easy use for common use cases while still giving you fine-grained control if you need it.

Creating an output

Media file creation in Mediabunny revolves around a central class, Output. One instance of Output represents one media file you want to create.

Start by creating a new instance of Output using the desired configuration of the file you want to create:

import { Output, Mp4OutputFormat, BufferTarget } from 'mediabunny';

// In this example, we'll be creating an MP4 file in memory:
const output = new Output({
	format: new Mp4OutputFormat(),
	target: new BufferTarget(),
});

See Output formats for a full list of available output formats.
See Output targets for a full list of available output targets.

You can always access format and target on the output:

output.format; // => Mp4OutputFormat	
output.target; // => BufferTarget

Adding tracks

There are a couple of methods on an Output that you can use to add tracks to it:

output.addVideoTrack(videoSource);
output.addAudioTrack(audioSource);
output.addSubtitleTrack(subtitleSource);

For each track you want to add, you'll need to create a unique media source for it. You'll be able to add media data to the output via these media sources. A media source can only ever be used for one output track.

Optionally, you can specify additional track metadata when adding tracks:

// This specifies that the video track should be rotated by 90 degrees
// clockwise before being displayed by video players, and that a frame rate
// of 30 FPS is expected.
output.addVideoTrack(videoSource, {
	// Clockwise rotation in degrees
	rotation: 90,
	// Expected frame rate in hertz
	frameRate: 30,
});

// This adds two audio tracks; one in English and one in German.
output.addAudioTrack(audioSourceEng, {
	language: 'eng', // ISO 639-2/T language code
});
output.addAudioTrack(audioSourceGer, {
	language: 'ger',
});

// This adds multiple subtitle tracks, all for different languages.
output.addSubtitleTrack(subtitleSourceEng, { language: 'eng' });
output.addSubtitleTrack(subtitleSourceGer, { language: 'ger' });
output.addSubtitleTrack(subtitleSourceSpa, { language: 'spa' });
output.addSubtitleTrack(subtitleSourceFre, { language: 'fre' });
output.addSubtitleTrack(subtitleSourceIta, { language: 'ita' });

INFO

The optional frameRate video track metadata option specifies the expected frame rate of the video. All timestamps and durations of frames that will be added to this track will be snapped to the specified frame rate. You should avoid adding frames more often than the rate permits, as this will lead to multiple frames having the same timestamp.

To precisely achieve common fractional frame rates, make sure to use their exact fractional forms: $23.976 \to 24000 / 1001$
$29.97 \to 30000 / 1001$
$59.94 \to 60000 / 1001$

As an example, let's add two tracks to our output:

A video track driven by the contents of a <canvas> element, encoded using AVC
An audio track driven by the user's microphone input, encoded using AAC

import { CanvasSource, MediaStreamAudioTrackSource } from 'mediabunny';

// Assuming `canvasElement` exists
const videoSource = new CanvasSource(canvasElement, {
	codec: 'avc',
	bitrate: 1e6, // 1 Mbps	
});

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioStreamTrack = stream.getAudioTracks()[0];
const audioSource = new MediaStreamAudioTrackSource(audioStreamTrack, {
	codec: 'aac',
	bitrate: 128e3, // 128 kbps
});

output.addVideoTrack(videoSource, { frameRate: 30 });
output.addAudioTrack(audioSource);

WARNING

Adding tracks to an Output will throw if the track is not compatible with the output format. Be sure to respect the properties of the output format when adding tracks.

Starting an output

After all tracks have been added to the Output, you need to start it. Starting an output spins up the writing process, allowing you to now start sending media data to the output file. It also prevents you from adding any new tracks to it.

await output.start(); // Resolves once the output is ready to receive media data

Adding media data

After starting an Output, you can use the media sources you used to add tracks to pipe media data to the output file. The API for this is different for each media source, but it typically looks something like this:

mediaSource.add(...);

In our example, as soon as we called start, the user's microphone input will be piped to the output file. However, we still need to add the data from our canvas. We might do something like this:

let framesAdded = 0;
const intervalId = setInterval(() => {
	const timestampInSeconds = framesAdded / 30;
	const durationInSeconds = 1 / 30;

	// Captures the canvas state at the time of calling `add`:
	videoSource.add(timestampInSeconds, durationInSeconds);
	framesAdded++;
}, 1000 / 30);

And then we'll let this run for as long as we want to capture media data.

Finalizing an output

Once all media data has been added, the Output needs to be finalized. Finalization finishes all remaining encoding work and writes the remaining data to create the final, playable media file.

await output.finalize(); // Resolves once the output is finalized

WARNING

After calling finalize, adding more media data to the output results in an error.

In our example, we'll need to do this:

clearInterval(intervalId); // Stops the canvas loop
audioStreamTrack.stop(); // Stops capturing the user's microphone

await output.finalize();

const file = output.target.buffer; // => Uint8Array

Canceling an output

Sometimes, you may want to cancel the ongoing creation of an output file. For this, use the cancel method:

await output.cancel(); // Resolves once the output is canceled

This automatically frees up all resources used by the output process, such as closing all encoders or releasing the writer.

WARNING

After calling cancel, adding more media data to the output results in an error.

In our example, we would do this:

clearInterval(intervalId); // Stops the canvas loop
audioStreamTrack.stop(); // Stops capturing the user's microphone

await output.cancel();

// The output is canceled

Checking output state

You can always check the current state the output is in using its state property:

output.state; // => 'pending' | 'started' | 'canceled' | 'finalizing' | 'finalized'

'pending' - The output hasn't been started or canceled yet; new tracks can be added.
'started' - The output has been started and is ready to receive media data; tracks can no longer be added.
'finalizing' - finalize has been called but hasn't resolved yet; no more media data can be added.
'finalized' - The output has been finalized and is done writing the file.
'canceled' - The output has been canceled.

Output targets

The output target determines where the data created by the Output will be written. This library offers two targets:

`BufferTarget`

This target writes all data to a single, contiguous, in-memory ArrayBuffer. This buffer will automatically grow as the file becomes larger. Usage is straightforward:

import { Output, BufferTarget } from 'mediabunny';

const output = new Output({
	target: new BufferTarget(),
	// ...
});

// ...

output.target.buffer; // => null
await output.finalize();
output.target.buffer; // => ArrayBuffer

This target is a great choice for small-ish files (< 100 MB), but since all data will be kept in memory, using it for large files is suboptimal. If the output gets very large, the page might crash due to memory exhaustion. For these cases, using StreamTarget is recommended.

`StreamTarget`

This target passes you the data written by the Output in small chunks, requiring you to pipe that data elsewhere to manually assemble the final file. Example use cases include writing the file directly to disk, or uploading it to a server over the network.

StreamTarget makes use of the Streams API, meaning you'll need to pass it an instance of WritableStream:

import { Output, StreamTarget, StreamTargetChunk } from 'mediabunny';

const writable = new WritableStream({
	write(chunk: StreamTargetChunk) {
		chunk.data; // => Uint8Array
		chunk.position; // => number

		// Do something with the data...
	}
});

const output = new Output({
	target: new StreamTarget(writable),
	// ...
});

Each chunk written to the WritableStream represents a contiguous chunk of bytes of the output file, data, that is expected to be written at the given byte offset, position. The WritableStream will automatically be closed when finalize or cancel are called on the Output.

WARNING

Note that some byte regions in the output file may be written to multiple times. It is therefore incorrect to construct the final file by simply concatenating all Uint8Arrays together - you must write each chunk of data at the specified byte offset position in the order in which the chunks arrived. If you don't do this, your output file will likely be invalid or corrupted.

Some output formats have append-only writing modes in which the byte offset of a written chunk will always be equal to the total number of bytes in all previously written chunks. In other words, when writing is append-only, simply concatening all Uint8Arrays yields the correct result. Some APIs (like appendBuffer of Media Source Extensions) require this, so make sure to configure your output format accordingly for those cases.

Chunked mode

By default, data will be emitted by the StreamTarget as soon as it is available. In some formats, these may lead to hundreds of write events per second. If you want to reduce the frequency of writes, StreamTarget offers an alternative "chunked mode" in which data will first be accumulated into large chunks of a given size in memory, and then only be emitted once a chunk is completely full.

new StreamTarget(writable, {
	chunked: true,
	chunkSize: 2 ** 20, // Optional; defaults to 16 MiB
}),

Applying backpressure

Sometimes, the Output may produce new data faster than you are able to write it. In this case, you want to communicate to the Output that it should "chill out" and slow down to match the pace that the WritableStream is able to handle. When using StreamTarget, the Output will automatically respect the backpressure applied by the WritableStream. For this, it is useful to understand the Stream API concepts of how to apply backpressure.

For example, the writable may apply backpressure by returning a promise in write:

const writable = new WritableStream({
	write(chunk: StreamTargetChunk) {
		// Pretend writing out data takes 10 milliseconds:
		return new Promise(resolve => setTimeout(resolve, 10));
	}
});

INFO

In order for the writable's backpressure to ripple through the entire pipeline, you must make sure to correctly respect the backpressure applied by media sources.

Usage with the File System API

StreamTargetChunk is designed such that it is compatible with the File System API's FileSystemWritableFileStream. This means, if you want to write data directly to disk, you can simply do something like this:

const handle = await window.showSaveFilePicker();
const writableStream = await handle.createWritable();

const output = new Output({
	target: new StreamTarget(writableStream),
	// ...
});

// ...

await output.finalize(); // Will automatically close the writable stream

Packet buffering

Some output formats require packet buffering for multi-track outputs. Packet buffering occurs because the Output must wait for data from all tracks for a given timestamp to continue writing data. For example, should you first encode all your video frames and then encode the audio afterward, the Output will have to hold all of the video frames in memory until the audio packets start coming in. This might lead to memory exhaustion should your video be very long. When there is only one media track, this issue does not arise.

Check the Output formats page to see which format configurations require packet buffering.

If your output format configuration requires packet buffering, make sure to add media data in a somewhat interleaved way to keep memory usage low. For example, if you're creating a 5-minute file, add your data in chunks - 10 seconds of video, then 10 seconds of audio, then repeat - instead of first adding all 300 seconds of video followed by all 300 seconds of audio.

INFO

If this kind of chunking isn't possible for your use case, try adding the media with the overall smaller data footprint first: First add the 300 seconds of audio, then add the 300 seconds of video.

Output MIME type

Sometimes you may want to retrieve the MIME type of the file created by an Output. For example, when working with Media Source Extensions, addSourceBuffer requires the file's full MIME type, including codec strings.

For this, use the following method:

output.getMimeType(); // => Promise<string>

This may resolve to a string like this:

video/mp4; codecs="avc1.42c032, mp4a.40.2"

WARNING

The promise returned by getMimeType only resolves once the precise codec strings for all tracks of the Output are known - meaning it potentially needs to wait for all encoders to be fully initialized. Therefore, make sure not to get yourself into a deadlock: Awaiting this method before adding media data to tracks will result in the promise never resolving.

If you don't care about specific track codecs, you can instead use the simpler mimeType property on the Output's format:

output.format.mimeType; // => string

Writing media files ​

Creating an output ​

Adding tracks ​

Starting an output ​

Adding media data ​

Finalizing an output ​

Canceling an output ​

Checking output state ​

Output targets ​

BufferTarget ​

StreamTarget ​

Chunked mode ​

Applying backpressure ​

Usage with the File System API ​

Packet buffering ​

Output MIME type ​

Writing media files

Creating an output

Adding tracks

Starting an output

Adding media data

Finalizing an output

Canceling an output

Checking output state

Output targets

`BufferTarget`

`StreamTarget`

Chunked mode

Applying backpressure

Usage with the File System API

Packet buffering

Output MIME type