[mimictts] Initial contribution (#13045)

* [mimictts] Initial contribution

Add a voice service for Mimic (version 3) TTS from Mycroft AI. 

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
This commit is contained in:
Gwendal Roulleau 2022-07-12 18:49:01 +02:00 committed by GitHub
parent 36bd806c28
commit 7a53e313cd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 558 additions and 0 deletions

View File

@ -390,6 +390,7 @@
/bundles/org.openhab.voice.googletts/ @gbicskei
/bundles/org.openhab.voice.mactts/ @kaikreuzer
/bundles/org.openhab.voice.marytts/ @kaikreuzer
/bundles/org.openhab.voice.mimictts/ @dalgwen
/bundles/org.openhab.voice.actiontemplatehli/ @GiviMAD
/bundles/org.openhab.voice.picotts/ @FlorianSW
/bundles/org.openhab.voice.pollytts/ @hillmanr

View File

@ -1951,6 +1951,11 @@
<artifactId>org.openhab.voice.marytts</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.openhab.addons.bundles</groupId>
<artifactId>org.openhab.voice.mimictts</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.openhab.addons.bundles</groupId>
<artifactId>org.openhab.voice.actiontemplatehli</artifactId>

View File

@ -0,0 +1,13 @@
This content is produced and maintained by the openHAB project.
* Project home: https://www.openhab.org
== Declared Project Licenses
This program and the accompanying materials are made available under the terms
of the Eclipse Public License 2.0 which is available at
https://www.eclipse.org/legal/epl-2.0/.
== Source Code
https://github.com/openhab/openhab-addons

View File

@ -0,0 +1,49 @@
# Mimic Text-to-Speech
Mimic (version 3 and above) is an offline open source Text-To-speech engine designed by Mycroft A.I. for the eponym Vocal Assistant, that provides multiple voices, available in different languages and variants.
Its neural network is built upon some very good and some not-so-good models, so try some to be sure you get the best one for your need.
Mimic3 doesn't need Mycroft, and it can be run as a simple command line utility, or as a web server with an API.
This TTS bundle makes use of this last feature, so please take note : this openHAB TTS bundle is NOT a standalone, and it requires the Mimic web server to run somewhere (on your openHAB computer, or your network).
You can find more information about the Mimic web server, and how to install it, on the [official documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3#installation).
It supports a subset of SSML, and if you want to use it, be sure to start your text with `<speak>`.
## Configuration
Using your favorite configuration UI to edit **Settings / Other Services - Mimic Text-to-Speech** and set:
* **url** - Mimic URL. Default to `http://localhost:59125`
* **speakingRate** - Controls how fast the voice speaks the text. A value of 1 is the speed of the training dataset. Less than 1 is faster, and more than 1 is slower.
* **audioVolatility** - The amount of noise added to the generated audio (0-1). Can help mask audio artifacts from the voice model. Multi-speaker models tend to sound better with a lower amount of noise than single speaker models.
* **phonemeVolatility** - The amount of noise used to generate phoneme durations (0-1). Allows for variable speaking cadance, with a value closer to 1 being more variable. Multi-speaker models tend to sound better with a lower amount of phoneme variability than single speaker models.
In case you would like to setup the service via a text file, create a new file in `$OPENHAB_ROOT/conf/services` named `mimictts.cfg`
Its contents should look similar to:
```
org.openhab.voice.mimictts:url=http://localhost:59125
org.openhab.voice.mimictts:speakingRate=1
org.openhab.voice.mimictts:audioVolatility=0.667
org.openhab.voice.mimictts:phonemeVolatility=0.8
```
### Default Text-to-Speech and Voice Configuration
You can setup your preferred default Text-to-Speech and default voice in the UI:
* Go to **Settings**.
* Edit **System Services - Voice**.
* Set **Mimic** as **Default Text-to-Speech**.
* Choose your preferred **Default Voice** for your setup.
In case you would like to setup these settings via a text file, you can edit the file `runtime.cfg` in `$OPENHAB_ROOT/conf/services` and set the following entries:
```
org.openhab.voice:defaultTTS=mimictts
org.openhab.voice:defaultVoice=mimictts:fr_FR_siwis_low
```

View File

@ -0,0 +1,17 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.openhab.addons.bundles</groupId>
<artifactId>org.openhab.addons.reactor.bundles</artifactId>
<version>3.4.0-SNAPSHOT</version>
</parent>
<artifactId>org.openhab.voice.mimictts</artifactId>
<name>openHAB Add-ons :: Bundles :: Voice :: mimic Text-To-Speech</name>
</project>

View File

@ -0,0 +1,9 @@
<?xml version="1.0" encoding="UTF-8"?>
<features name="org.openhab.voice.mimictts-${project.version}" xmlns="http://karaf.apache.org/xmlns/features/v1.4.0">
<repository>mvn:org.openhab.core.features.karaf/org.openhab.core.features.karaf.openhab-core/${ohc.version}/xml/features</repository>
<feature name="openhab-voice-mimictts" description="Mimic Text-to-Speech" version="${project.version}">
<feature>openhab-runtime-base</feature>
<bundle start-level="80">mvn:org.openhab.addons.bundles/org.openhab.voice.mimictts/${project.version}</bundle>
</feature>
</features>

View File

@ -0,0 +1,28 @@
/**
* Copyright (c) 2010-2022 Contributors to the openHAB project
*
* See the NOTICE file(s) distributed with this work for additional
* information.
*
* This program and the accompanying materials are made available under the
* terms of the Eclipse Public License 2.0 which is available at
* http://www.eclipse.org/legal/epl-2.0
*
* SPDX-License-Identifier: EPL-2.0
*/
package org.openhab.voice.mimic.internal;
import org.eclipse.jdt.annotation.NonNullByDefault;
/**
* The {@link MimicConfiguration} class contains fields mapping configuration parameters.
*
* @author Gwendal Roulleau - Initial contribution
*/
@NonNullByDefault
public class MimicConfiguration {
public String url = "http://localhost:59125";
public Double speakingRate = 1.0;
public Double audioVolatility = 0.667;
public Double phonemeVolatility = 0.8;
}

View File

@ -0,0 +1,249 @@
/**
* Copyright (c) 2010-2022 Contributors to the openHAB project
*
* See the NOTICE file(s) distributed with this work for additional
* information.
*
* This program and the accompanying materials are made available under the
* terms of the Eclipse Public License 2.0 which is available at
* http://www.eclipse.org/legal/epl-2.0
*
* SPDX-License-Identifier: EPL-2.0
*/
package org.openhab.voice.mimic.internal;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
import java.util.HashSet;
import java.util.Locale;
import java.util.Map;
import java.util.Set;
import org.eclipse.jdt.annotation.NonNullByDefault;
import org.eclipse.jdt.annotation.Nullable;
import org.openhab.core.audio.AudioFormat;
import org.openhab.core.audio.AudioStream;
import org.openhab.core.audio.ByteArrayAudioStream;
import org.openhab.core.config.core.ConfigurableService;
import org.openhab.core.io.net.http.HttpRequestBuilder;
import org.openhab.core.io.net.http.HttpUtil;
import org.openhab.core.library.types.RawType;
import org.openhab.core.voice.TTSException;
import org.openhab.core.voice.TTSService;
import org.openhab.core.voice.Voice;
import org.openhab.voice.mimic.internal.dto.VoiceDto;
import org.osgi.framework.Constants;
import org.osgi.service.component.annotations.Activate;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Modified;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonSyntaxException;
/**
* Mimic Voice service implementation.
*
* @author Gwendal Roulleau - Initial contribution
*/
@Component(configurationPid = MimicTTSService.SERVICE_PID, property = Constants.SERVICE_PID + "="
+ MimicTTSService.SERVICE_PID)
@ConfigurableService(category = MimicTTSService.SERVICE_CATEGORY, label = MimicTTSService.SERVICE_NAME
+ " Text-to-Speech", description_uri = MimicTTSService.SERVICE_CATEGORY + ":" + MimicTTSService.SERVICE_ID)
@NonNullByDefault
public class MimicTTSService implements TTSService {
static final String SERVICE_CATEGORY = "voice";
static final String SERVICE_ID = "mimictts";
static final String SERVICE_PID = "org.openhab." + SERVICE_CATEGORY + "." + SERVICE_ID;
static final String SERVICE_NAME = "Mimic";
/**
* Configuration parameters
*/
private static final String PARAM_URL = "url";
private static final String PARAM_SPEAKINGRATE = "speakingRate";
private static final String PARAM_AUDIOVOLATITLITY = "audioVolatility";
private static final String PARAM_PHONEMEVOLATITLITY = "phonemeVolatility";
/**
* Url
*/
private static final String LIST_VOICES_URL = "/api/voices";
private static final String SYNTHETIZE_URL = "/api/tts";
/** The only wave format supported */
private static final AudioFormat AUDIO_FORMAT = new AudioFormat(AudioFormat.CONTAINER_WAVE,
AudioFormat.CODEC_PCM_SIGNED, false, 16, 52000, 22050L, 1);
private Set<Voice> availableVoices = new HashSet<>();
/**
* Logger.
*/
private final Logger logger = LoggerFactory.getLogger(MimicTTSService.class);
private final MimicConfiguration config = new MimicConfiguration();
private final Gson gson = new GsonBuilder().create();
@Activate
protected void activate(Map<String, Object> config) {
updateConfig(config);
}
/**
* Called by the framework when the configuration was updated.
*
* @param newConfig Updated configuration
*/
@Modified
private void updateConfig(Map<String, Object> newConfig) {
logger.debug("Updating configuration");
// client id
Object param = newConfig.get(PARAM_URL);
if (param == null) {
logger.warn("Missing URL to access Mimic TTS API. Using localhost");
} else {
config.url = param.toString();
}
// audio volatility
try {
param = newConfig.get(PARAM_AUDIOVOLATITLITY);
if (param != null) {
config.audioVolatility = Double.parseDouble(param.toString());
}
} catch (NumberFormatException e) {
logger.warn("Cannot parse audioVolatility parameter. Using default");
}
// phoneme volatility
try {
param = newConfig.get(PARAM_PHONEMEVOLATITLITY);
if (param != null) {
config.phonemeVolatility = Double.parseDouble(param.toString());
}
} catch (NumberFormatException e) {
logger.warn("Cannot parse phonemeVolatility parameter. Using default");
}
// speakingRate
try {
param = newConfig.get(PARAM_SPEAKINGRATE);
if (param != null) {
config.speakingRate = Double.parseDouble(param.toString());
}
} catch (NumberFormatException e) {
logger.warn("Cannot parse speakingRate parameter. Using default");
}
refreshVoices();
}
@Override
public String getId() {
return SERVICE_ID;
}
@Override
public String getLabel(@Nullable Locale locale) {
return SERVICE_NAME;
}
@Override
public Set<Voice> getAvailableVoices() {
return availableVoices;
}
public void refreshVoices() {
String url = config.url + LIST_VOICES_URL;
availableVoices.clear();
try {
String responseVoices = HttpRequestBuilder.getFrom(url).getContentAsString();
VoiceDto[] mimicVoiceResponse = gson.fromJson(responseVoices, VoiceDto[].class);
if (mimicVoiceResponse == null) {
logger.warn("Cannot get mimic voices from the URL {}", url);
return;
} else if (mimicVoiceResponse.length == 0) {
logger.debug("Voice set response from Mimic is empty ?!");
return;
}
for (VoiceDto voiceDto : mimicVoiceResponse) {
if (voiceDto.speakers != null && voiceDto.speakers.size() > 0) {
for (String speaker : voiceDto.speakers) {
availableVoices.add(new MimicVoice(voiceDto.key, voiceDto.language, voiceDto.name, speaker));
}
} else {
availableVoices.add(new MimicVoice(voiceDto.key, voiceDto.language, voiceDto.name, null));
}
}
} catch (IOException | JsonSyntaxException e) {
logger.warn("Cannot get mimic voices from the URL {}, error {}", url, e.getMessage());
}
}
@Override
public Set<AudioFormat> getSupportedFormats() {
return Set.<AudioFormat> of(AUDIO_FORMAT);
}
/**
* Checks parameters and calls the API to synthesize voice.
*
* @param text Input text.
* @param voice Selected voice.
* @param requestedFormat Format that is supported by the target sink as well.
* @return Output audio stream
* @throws TTSException in case the service is unavailable or a parameter is invalid.
*/
@Override
public AudioStream synthesize(String text, Voice voice, AudioFormat requestedFormat) throws TTSException {
if (!availableVoices.contains(voice)) {
// let a chance for the service to update :
refreshVoices();
if (!availableVoices.contains(voice)) {
throw new TTSException("Voice " + voice.getUID() + " not available for MimicTTS");
}
}
logger.debug("Synthesize '{}' for voice '{}' in format {}", text, voice.getUID(), requestedFormat);
// Validate arguments
// trim text
String trimmedText = text.trim();
if (trimmedText.isEmpty()) {
throw new TTSException("The passed text is empty");
}
if (!AUDIO_FORMAT.isCompatible(requestedFormat)) {
throw new TTSException("The passed AudioFormat is unsupported");
}
String encodedText;
try {
encodedText = URLEncoder.encode(text, StandardCharsets.UTF_8.toString());
} catch (UnsupportedEncodingException e) {
throw new IllegalArgumentException("Cannot encode text in URL " + text);
}
String ssml = "";
if (text.startsWith("<speak>")) {
ssml = "&ssml=true";
}
// create the audio byte array for given text, locale, format
String urlTTS = config.url + SYNTHETIZE_URL + "?text=" + encodedText + "&voice="
+ ((MimicVoice) voice).getTechnicalName() + ssml + "&noiseScale=" + config.audioVolatility + "&noiseW="
+ config.phonemeVolatility + "&lengthScale=" + config.speakingRate + "&audioTarget=client";
logger.debug("Querying mimic with URL {}", urlTTS);
RawType responseWav = HttpUtil.downloadData(urlTTS, "audio/wav", false, -1);
if (responseWav == null) {
throw new TTSException("Cannot get wav from mimic url " + urlTTS);
}
return new ByteArrayAudioStream(responseWav.getBytes(), AUDIO_FORMAT);
}
}

View File

@ -0,0 +1,95 @@
/**
* Copyright (c) 2010-2022 Contributors to the openHAB project
*
* See the NOTICE file(s) distributed with this work for additional
* information.
*
* This program and the accompanying materials are made available under the
* terms of the Eclipse Public License 2.0 which is available at
* http://www.eclipse.org/legal/epl-2.0
*
* SPDX-License-Identifier: EPL-2.0
*/
package org.openhab.voice.mimic.internal;
import java.util.Locale;
import java.util.Objects;
import org.eclipse.jdt.annotation.NonNullByDefault;
import org.eclipse.jdt.annotation.Nullable;
import org.openhab.core.voice.Voice;
/**
* Mimic Voice representation.
*
* @author Gwendal Roulleau - Initial contribution
*/
@NonNullByDefault
public class MimicVoice implements Voice {
@Nullable
private String speaker;
private final Locale locale;
private final String key;
private final String name;
public MimicVoice(String key, String language, String name, @Nullable String speaker) {
this.key = key;
this.locale = Locale.forLanguageTag(language.replaceAll("_", "-"));
this.name = name;
this.speaker = speaker;
}
/**
* Globally unique identifier of the voice.
*
* @return A String uniquely identifying the voice globally
*/
@Override
public String getUID() {
return "mimictts:" + getTechnicalName().replaceAll("[^a-zA-Z0-9_]", "_");
}
/**
* Technical name of the voice.
*
* @return A String voice technical name
*/
public String getTechnicalName() {
String speakerId = (speaker != null) ? "#" + speaker : "";
return (key + speakerId);
}
@Override
public String getLabel() {
return name + ((speaker != null) ? " (" + speaker + ")" : "");
}
@Override
public Locale getLocale() {
return locale;
}
@Override
public int hashCode() {
return Objects.hash(getTechnicalName());
}
@Override
public boolean equals(@Nullable Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
MimicVoice other = (MimicVoice) obj;
return Objects.equals(getTechnicalName(), other.getTechnicalName());
}
}

View File

@ -0,0 +1,32 @@
/**
* Copyright (c) 2010-2022 Contributors to the openHAB project
*
* See the NOTICE file(s) distributed with this work for additional
* information.
*
* This program and the accompanying materials are made available under the
* terms of the Eclipse Public License 2.0 which is available at
* http://www.eclipse.org/legal/epl-2.0
*
* SPDX-License-Identifier: EPL-2.0
*/
package org.openhab.voice.mimic.internal.dto;
import java.util.ArrayList;
import java.util.List;
import org.eclipse.jdt.annotation.NonNullByDefault;
/**
* Mimic Voice DTO.
*
* @author Gwendal Roulleau - Initial contribution
*/
@NonNullByDefault
public class VoiceDto {
public String key = "UNDEFINED";
public String language = "UNDEFINED";
public String name = "UNDEFINED";
public List<String> speakers = new ArrayList<>();
}

View File

@ -0,0 +1,37 @@
<?xml version="1.0" encoding="UTF-8"?>
<config-description:config-descriptions
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:config-description="https://openhab.org/schemas/config-description/v1.0.0"
xsi:schemaLocation="https://openhab.org/schemas/config-description/v1.0.0
https://openhab.org/schemas/config-description-1.0.0.xsd">
<config-description uri="voice:mimictts">
<parameter name="url" type="text">
<label>URL</label>
<description>Mimic 3 URL.</description>
<default>http://localhost:59125</default>
</parameter>
<parameter name="speakingRate" min="0" max="1" type="decimal" required="false">
<label>Speaking Rate</label>
<description>Controls how fast the voice speaks the text. A value of 1 is the speed of the training dataset. Less
than 1 is faster, and more than 1 is slower.</description>
<default>1</default>
</parameter>
<parameter name="audioVolatility" min="0" max="1" type="decimal" required="false">
<label>Audio Volatility</label>
<description>The amount of noise added to the generated audio (0-1). Can help mask audio artifacts from the voice
model. Multi-speaker models tend to sound better with a lower amount of noise than single speaker models.
</description>
<default>0.667</default>
</parameter>
<parameter name="phonemeVolatility" min="0" max="1" type="decimal" required="false">
<label>Phoneme Volatility</label>
<description>The amount of noise used to generate phoneme durations (0-1). Allows for variable speaking cadance, with
a value closer to 1 being more variable. Multi-speaker models tend to sound better with a lower amount of phoneme
variability than single speaker models.
</description>
<default>0.8</default>
</parameter>
</config-description>
</config-description:config-descriptions>

View File

@ -0,0 +1,11 @@
voice.config.mimictts.audioVolatility.label = Audio Volatility
voice.config.mimictts.audioVolatility.description = The amount of noise added to the generated audio (0-1). Can help mask audio artifacts from the voice model. Multi-speaker models tend to sound better with a lower amount of noise than single speaker models.
voice.config.mimictts.phonemeVolatility.label = Phoneme Volatility
voice.config.mimictts.phonemeVolatility.description = The amount of noise used to generate phoneme durations (0-1). Allows for variable speaking cadance, with a value closer to 1 being more variable. Multi-speaker models tend to sound better with a lower amount of phoneme variability than single speaker models.
voice.config.mimictts.speakingRate.label = Speaking Rate
voice.config.mimictts.speakingRate.description = Controls how fast the voice speaks the text. A value of 1 is the speed of the training dataset. Less than 1 is faster, and more than 1 is slower.
voice.config.mimictts.url.label = URL
voice.config.mimictts.url.description = Mimic 3 URL.
# service
service.voice.mimictts.label = Mimic Text-to-Speech

View File

@ -0,0 +1,11 @@
voice.config.mimictts.audioVolatility.label = Volatilité Audio
voice.config.mimictts.audioVolatility.description = Quantité de bruit ajouté à l'audio généré (0-1). Peut aider à masquer les artefacts du modèle de voix. Les modèles à plusieurs élocuteurs sonnent mieux avec une quantité réduite de bruit, contrairement aux modèles à élocuteur unique.
voice.config.mimictts.phonemeVolatility.label = Volatilité Phonème
voice.config.mimictts.phonemeVolatility.description = La quantité de bruit utilisé pour générer les durées des phonèmes (0-1). Permet une cadence d'élocution variable, avec une valeur proche de 1 signifiant plus de variabilité. Les modèles à plusieurs élocuteurs sonnent mieux avec une quantité réduite de bruit, contrairement aux modèles à élocuteur unique.
voice.config.mimictts.speakingRate.label = Vitesse d'élocution
voice.config.mimictts.speakingRate.description = Controle la vitesse d'élocution. Une valeur de 1 correspond à la vitesse d'entrainement du jeu de données. Inférieur à 1 est plus rapide, et supérieur à 1 est plus lent.
voice.config.mimictts.url.label = URL
voice.config.mimictts.url.description = L'URL pour joindre l'API de Mimic.
# service
service.voice.mimictts.label = Synthèse vocale Mimic

View File

@ -410,6 +410,7 @@
<module>org.openhab.voice.googletts</module>
<module>org.openhab.voice.mactts</module>
<module>org.openhab.voice.marytts</module>
<module>org.openhab.voice.mimictts</module>
<module>org.openhab.voice.actiontemplatehli</module>
<module>org.openhab.voice.picotts</module>
<module>org.openhab.voice.pollytts</module>