123 lines
5.6 KiB
Markdown
123 lines
5.6 KiB
Markdown

|
|
|
|
Audiobook Recorder
|
|
==================
|
|
|
|
A system for easing the task of recording and editing audiobooks.
|
|
|
|
* Zero editing
|
|
* MP3 export
|
|
* Chapter management
|
|
* Audio effect chains (biquad, delay line, etc)
|
|
|
|
# [Manual and Tutorial](https://majenkoprojects.github.io/AudiobookRecorder)
|
|
|
|
Usage
|
|
-----
|
|
|
|
Step one is to open Tools -> Options and set the system up in the way you need. Mainly you will
|
|
want to set the audio settings - select the device or sound system you want to record from and
|
|
play back to. For Linux you probably want to select "the ear-candy mixer" for PulseAudio.
|
|
|
|
Create yourself a new book either using File -> New Book or by pressing the New Book icon in the toolbar ().
|
|
|
|
Your first action with a new book should be to record the "room noise" (). This is 5 seconds of silence recorded
|
|
from your microphone. It is used to both calculate the "noise floor" for audio detection (see below) and also
|
|
for stitching the recorded phrases together. You should keep as quiet as you can while it's recording.
|
|
|
|
From here on much is controlled by key presses.
|
|
|
|
* Press and hold "R" to record a new phrase - the screen flashes red while it's recording. The phrase is
|
|
appended to the currently selected chapter, or to the last chapter if none is selected.
|
|
* Press and hold "T" to record a new phrase that is the start of a new paragraph. This adds the "post paragraph" gap to the previous sentence. Otherwise it does the same as "R".
|
|
* Press and hold "F" to record a "continuation" phrase. This sets the previous phrase's post-gap to be the "short" gap instead of the normal length gap.
|
|
* Press and hold "Y" to record a new phrase that is the start of a new section. This add the "post section" gap to the previous sentence. Otherwise it does the same as "R".
|
|
* Press "D" to delete the last phrase you recorded.
|
|
* Press "E" to re-record the currently selected phrase.
|
|
|
|
Each phrase you record can be automatically analysed to find the start and end of the audio and set
|
|
crop marks appropriately. These can be adjusted in the waveform display when a phrase is selected. You can also
|
|
re-run the analysis using either FFT or a peak detector method (finding the first and last points
|
|
where the audio amplitude rises above the backround noise).
|
|
|
|
The phrases also have a "post gap" associated with them. This is the amount of room noise (in milliseconds) to place between
|
|
the current phrase and the next phrase when playing or exporting.
|
|
|
|
Speaking of playing - you have the option (in the toolbar) to either play the currently selected phrase in isolation () or
|
|
to start playing from the currently selected phrase on to the end of the chapter(). This is good for testing your post gaps.
|
|
|
|
Exporting the project creates MP3 files for each chapter using the settings selected in Options.
|
|
|
|
Extra functions
|
|
---------------
|
|
|
|
The recordings are initially given a unique ID. You can
|
|
edit the text of this ID to identify the recordings. You
|
|
may, for instance, change it to have the same text as the
|
|
audio contains.
|
|
|
|
The audio can also be automatically converted to text if you have an suitable command-line
|
|
executable that will work. One example is (on Linux) [DeepSpeech](https://github.com/mozilla/DeepSpeech) by Mozilla.
|
|
|
|
File layout
|
|
-----------
|
|
|
|
All data is stored in your "storage" directory (specified in Options). Each book (which is a directory named after the
|
|
title of the book) has an associated XML file (audiobook.abx) and a directory "files" where all the audio (stored as WAV
|
|
files) is placed.
|
|
|
|
When you export the book as MP3 a new folder "export" is created within the book's folder where the MP3 files are placed.
|
|
MP3 files are all tagged with the book title, chapter title, chapter number and comment.
|
|
|
|
|
|
Building
|
|
========
|
|
|
|
1. Check out this repo
|
|
2. Install `ant` and `default-jdk`
|
|
3. Install [git LFS support](https://help.github.com/articles/installing-git-large-file-storage/)
|
|
4. Pull the large files with `git lfs pull`
|
|
5. Build with `ant build`
|
|
6. Run with `java -jar ./AudiobookRecorder.jar`
|
|
|
|
|
|
----
|
|
|
|
Extra Resources
|
|
===============
|
|
|
|
* DeepSpeech wrapper script
|
|
|
|
This is a small script that will convert the audio into a format DeepSpeech likes and call the `deepspeech` executable, removing any extra rubbish from the output. It
|
|
also requires `sox` to be installed for the audio conversion.
|
|
|
|
```
|
|
#!/bin/bash
|
|
|
|
ID=$$
|
|
FILE=$1
|
|
BINPATH=${HOME}/local/bin
|
|
MODELS=${HOME}/ds/deepspeech-0.6.1-models
|
|
|
|
sox "$FILE" -r 16000 -c 1 -b 16 "/tmp/ds-${ID}.wav"
|
|
${BINPATH}/deepspeech --model ${MODELS}/output_graph.pbmm --lm ${MODELS}/lm.binary --trie ${MODELS}/trie --audio "/tmp/ds-${ID}.wav" 2>/dev/null
|
|
rm /tmp/ds-${ID}.wav
|
|
```
|
|
|
|
----
|
|
|
|
* Whisper wrapper script
|
|
|
|
This small script will use Whisper to do the same job as DeepSpeech but considerably faster and better.
|
|
|
|
```
|
|
#!/bin/bash
|
|
|
|
whisper --output_format txt --model small --fp16 False --output_dir /tmp --device cuda --language English "$1" > /dev/null
|
|
BASE=$(basename "$1" .wav)
|
|
TXT=${BASE}.txt
|
|
cat "/tmp/${TXT}" | tr '
|
|
' ' '
|
|
echo ""
|
|
```
|