Compare commits

...

8 Commits

Author SHA1 Message Date
Gwendal Roulleau
e5d06abd64
Merge 5487ef17bc into 4e88f48a71 2025-01-09 12:57:13 +01:00
lsiepel
4e88f48a71
[tacmi] Fix SAT errors (#18046)
Signed-off-by: Leo Siepel <leosiepel@gmail.com>
2025-01-09 12:07:50 +01:00
Jacob Laursen
e69c44b85e
Disable another unstable test (#18069)
Related to #12474

Signed-off-by: Jacob Laursen <jacob-github@vindvejr.dk>
2025-01-09 12:07:01 +01:00
Jacob Laursen
46d27b6fb5
Disable another unstable test (#18068)
Related to #12667

Signed-off-by: Jacob Laursen <jacob-github@vindvejr.dk>
2025-01-09 12:06:30 +01:00
mlobstein
f6efa87fb2
[roku] Add End Time and Media Progress channels (#18059)
Signed-off-by: Michael Lobstein <michael.lobstein@gmail.com>
2025-01-09 08:12:57 +01:00
Jacob Laursen
98ff656400
Fix headers (#18070)
Signed-off-by: Jacob Laursen <jacob-github@vindvejr.dk>
2025-01-08 23:25:39 +01:00
Gwendal Roulleau
5487ef17bc [whisper] Add OpenAI API compatibility
Apply PR comments

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
2024-12-30 11:47:42 +01:00
Gwendal Roulleau
e40473594a [whisper] Add OpenAI API compatibility
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
2024-12-17 23:04:36 +01:00
18 changed files with 427 additions and 146 deletions

View File

@ -1,4 +1,4 @@
/**
/*
* Copyright (c) 2010-2025 Contributors to the openHAB project
*
* See the NOTICE file(s) distributed with this work for additional

View File

@ -1,4 +1,4 @@
/**
/*
* Copyright (c) 2010-2025 Contributors to the openHAB project
*
* See the NOTICE file(s) distributed with this work for additional

View File

@ -46,6 +46,8 @@ The following channels are available:
| playMode | String | The current playback mode ie: stop, play, pause (ReadOnly). |
| timeElapsed | Number:Time | The total number of seconds of playback time elapsed for the current playing title (ReadOnly). |
| timeTotal | Number:Time | The total length of the current playing title in seconds (ReadOnly). This data is not provided by all streaming apps. |
| endTime | DateTime | The date/time when the currently playing media will end (ReadOnly). N/A if timeTotal is not provided by the current streaming app. |
| progress | Dimmer | The current progress [0-100%] of playing media (ReadOnly). N/A if timeTotal is not provided by the current streaming app. |
| activeChannel | String | A dropdown containing a list of available TV channels on the Roku TV. The channel currently tuned is automatically selected. The list updates every 10 minutes. |
| signalMode | String | The signal type of the current TV channel, ie: 1080i (ReadOnly). |
| signalQuality | Number:Dimensionless | The signal quality of the current TV channel, 0-100% (ReadOnly). |
@ -59,6 +61,7 @@ The following channels are available:
Some Notes:
- The values for `activeApp`, `activeAppName`, `playMode`, `timeElapsed`, `timeTotal`, `activeChannel`, `signalMode`, `signalQuality`, `channelName`, `programTitle`, `programDescription`, `programRating`, `power` & `powerState` refresh automatically per the configured `refresh` interval.
- The `endTime` and `progress` channels may not be accurate for some streaming apps especially 'live' streams where the `timeTotal` value constantly increases.
**List of available button commands for Roku streaming devices:**
@ -113,32 +116,36 @@ roku:roku_tv:mytv1 "My Roku TV" [ hostName="192.168.10.1", refresh=10 ]
```java
// Roku streaming media player items:
String Player_ActiveApp "Current App: [%s]" { channel="roku:roku_player:myplayer1:activeApp" }
String Player_ActiveAppName "Current App Name: [%s]" { channel="roku:roku_player:myplayer1:activeAppName" }
String Player_Button "Send Command to Roku" { channel="roku:roku_player:myplayer1:button" }
Player Player_Control "Control" { channel="roku:roku_player:myplayer1:control" }
String Player_PlayMode "Status: [%s]" { channel="roku:roku_player:myplayer1:playMode" }
Number:Time Player_TimeElapsed "Elapsed Time: [%d %unit%]" { channel="roku:roku_player:myplayer1:timeElapsed" }
Number:Time Player_TimeTotal "Total Time: [%d %unit%]" { channel="roku:roku_player:myplayer1:timeTotal" }
String Player_ActiveApp "Current App: [%s]" { channel="roku:roku_player:myplayer1:activeApp" }
String Player_ActiveAppName "Current App Name: [%s]" { channel="roku:roku_player:myplayer1:activeAppName" }
String Player_Button "Send Command to Roku" { channel="roku:roku_player:myplayer1:button" }
Player Player_Control "Control" { channel="roku:roku_player:myplayer1:control" }
String Player_PlayMode "Status: [%s]" { channel="roku:roku_player:myplayer1:playMode" }
Number:Time Player_TimeElapsed "Elapsed Time: [%d %unit%]" { channel="roku:roku_player:myplayer1:timeElapsed" }
Number:Time Player_TimeTotal "Total Time: [%d %unit%]" { channel="roku:roku_player:myplayer1:timeTotal" }
DateTime Player_EndTime "End Time: [%1$tl:%1$tM %1$tp]" { channel="roku:roku_player:myplayer1:endTime" }
Dimmer Player_Progress "Progress [%.0f%%]" { channel="roku:roku_player:myplayer1:progress" }
// Roku TV items:
Switch Player_Power "Power: [%s]" { channel="roku:roku_tv:mytv1:power" }
String Player_PowerState "Power State: [%s] { channel="roku:roku_tv:mytv1:powerState" }
String Player_ActiveApp "Current App: [%s]" { channel="roku:roku_tv:mytv1:activeApp" }
String Player_ActiveAppName "Current App Name: [%s]" { channel="roku:roku_tv:mytv1:activeAppName" }
String Player_Button "Send Command to Roku" { channel="roku:roku_tv:mytv1:button" }
Player Player_Control "Control" { channel="roku:roku_tv:mytv1:control" }
String Player_PlayMode "Status: [%s]" { channel="roku:roku_tv:mytv1:playMode" }
Number:Time Player_TimeElapsed "Elapsed Time: [%d %unit%]" { channel="roku:roku_tv:mytv1:timeElapsed" }
Number:Time Player_TimeTotal "Total Time: [%d %unit%]" { channel="roku:roku_tv:mytv1:timeTotal" }
String Player_ActiveChannel "Current Channel: [%s]" { channel="roku:roku_tv:mytv1:activeChannel" }
String Player_SignalMode "Signal Mode: [%s]" { channel="roku:roku_tv:mytv1:signalMode" }
Number Player_SignalQuality "Signal Quality: [%d %%]" { channel="roku:roku_tv:mytv1:signalQuality" }
String Player_ChannelName "Channel Name: [%s]" { channel="roku:roku_tv:mytv1:channelName" }
String Player_ProgramTitle "Program Title: [%s]" { channel="roku:roku_tv:mytv1:programTitle" }
String Player_ProgramDescription "Program Description: [%s]" { channel="roku:roku_tv:mytv1:programDescription" }
String Player_ProgramRating "Program Rating: [%s]" { channel="roku:roku_tv:mytv1:programRating" }
Switch Player_Power "Power: [%s]" { channel="roku:roku_tv:mytv1:power" }
String Player_PowerState "Power State: [%s] { channel="roku:roku_tv:mytv1:powerState" }
String Player_ActiveApp "Current App: [%s]" { channel="roku:roku_tv:mytv1:activeApp" }
String Player_ActiveAppName "Current App Name: [%s]" { channel="roku:roku_tv:mytv1:activeAppName" }
String Player_Button "Send Command to Roku" { channel="roku:roku_tv:mytv1:button" }
Player Player_Control "Control" { channel="roku:roku_tv:mytv1:control" }
String Player_PlayMode "Status: [%s]" { channel="roku:roku_tv:mytv1:playMode" }
Number:Time Player_TimeElapsed "Elapsed Time: [%d %unit%]" { channel="roku:roku_tv:mytv1:timeElapsed" }
Number:Time Player_TimeTotal "Total Time: [%d %unit%]" { channel="roku:roku_tv:mytv1:timeTotal" }
DateTime Player_EndTime "End Time: [%1$tl:%1$tM %1$tp]" { channel="roku:roku_tv:mytv1:endTime" }
Dimmer Player_Progress "Progress [%.0f%%]" { channel="roku:roku_tv:mytv1:progress" }
String Player_ActiveChannel "Current Channel: [%s]" { channel="roku:roku_tv:mytv1:activeChannel" }
String Player_SignalMode "Signal Mode: [%s]" { channel="roku:roku_tv:mytv1:signalMode" }
Number Player_SignalQuality "Signal Quality: [%d %%]" { channel="roku:roku_tv:mytv1:signalQuality" }
String Player_ChannelName "Channel Name: [%s]" { channel="roku:roku_tv:mytv1:channelName" }
String Player_ProgramTitle "Program Title: [%s]" { channel="roku:roku_tv:mytv1:programTitle" }
String Player_ProgramDescription "Program Description: [%s]" { channel="roku:roku_tv:mytv1:programDescription" }
String Player_ProgramRating "Program Rating: [%s]" { channel="roku:roku_tv:mytv1:programRating" }
```
### `roku.sitemap` Example
@ -154,6 +161,8 @@ sitemap roku label="Roku" {
Text item=Player_PlayMode
Text item=Player_TimeElapsed icon="time"
Text item=Player_TimeTotal icon="time"
Text item=Player_EndTime icon="time"
Slider item=Player_Progress icon="time"
// The following items apply to Roku TVs only
Switch item=Player_Power
Text item=Player_PowerState

View File

@ -55,6 +55,8 @@ public class RokuBindingConstants {
public static final String PLAY_MODE = "playMode";
public static final String TIME_ELAPSED = "timeElapsed";
public static final String TIME_TOTAL = "timeTotal";
public static final String END_TIME = "endTime";
public static final String PROGRESS = "progress";
public static final String ACTIVE_CHANNEL = "activeChannel";
public static final String SIGNAL_MODE = "signalMode";
public static final String SIGNAL_QUALITY = "signalQuality";

View File

@ -14,6 +14,8 @@ package org.openhab.binding.roku.internal.handler;
import static org.openhab.binding.roku.internal.RokuBindingConstants.*;
import java.math.BigDecimal;
import java.time.Instant;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
@ -34,8 +36,10 @@ import org.openhab.binding.roku.internal.dto.DeviceInfo;
import org.openhab.binding.roku.internal.dto.Player;
import org.openhab.binding.roku.internal.dto.TvChannel;
import org.openhab.binding.roku.internal.dto.TvChannels.Channel;
import org.openhab.core.library.types.DateTimeType;
import org.openhab.core.library.types.NextPreviousType;
import org.openhab.core.library.types.OnOffType;
import org.openhab.core.library.types.PercentType;
import org.openhab.core.library.types.PlayPauseType;
import org.openhab.core.library.types.QuantityType;
import org.openhab.core.library.types.StringType;
@ -195,21 +199,32 @@ public class RokuHandler extends BaseThingHandler {
PLAY.equalsIgnoreCase(playerInfo.getState()) ? PlayPauseType.PLAY : PlayPauseType.PAUSE);
// Remove non-numeric from string, ie: ' ms'
String position = playerInfo.getPosition().replaceAll(NON_DIGIT_PATTERN, EMPTY);
if (!EMPTY.equals(position)) {
updateState(TIME_ELAPSED,
new QuantityType<>(Integer.parseInt(position) / 1000, API_SECONDS_UNIT));
final String positionStr = playerInfo.getPosition().replaceAll(NON_DIGIT_PATTERN, EMPTY);
int position = -1;
if (!EMPTY.equals(positionStr)) {
position = Integer.parseInt(positionStr) / 1000;
updateState(TIME_ELAPSED, new QuantityType<>(position, API_SECONDS_UNIT));
} else {
updateState(TIME_ELAPSED, UnDefType.UNDEF);
}
String duration = playerInfo.getDuration().replaceAll(NON_DIGIT_PATTERN, EMPTY);
if (!EMPTY.equals(duration)) {
updateState(TIME_TOTAL,
new QuantityType<>(Integer.parseInt(duration) / 1000, API_SECONDS_UNIT));
final String durationStr = playerInfo.getDuration().replaceAll(NON_DIGIT_PATTERN, EMPTY);
int duration = -1;
if (!EMPTY.equals(durationStr)) {
duration = Integer.parseInt(durationStr) / 1000;
updateState(TIME_TOTAL, new QuantityType<>(duration, API_SECONDS_UNIT));
} else {
updateState(TIME_TOTAL, UnDefType.UNDEF);
}
if (position >= 0 && duration > 0) {
updateState(END_TIME, new DateTimeType(Instant.now().plusSeconds(duration - position)));
updateState(PROGRESS,
new PercentType(BigDecimal.valueOf(Math.round(position / (double) duration * 100.0))));
} else {
updateState(END_TIME, UnDefType.UNDEF);
updateState(PROGRESS, UnDefType.UNDEF);
}
} catch (NumberFormatException e) {
logger.debug("Unable to parse playerInfo integer value. Exception: {}", e.getMessage());
} catch (RokuLimitedModeException e) {
@ -224,6 +239,8 @@ public class RokuHandler extends BaseThingHandler {
updateState(PLAY_MODE, UnDefType.UNDEF);
updateState(TIME_ELAPSED, UnDefType.UNDEF);
updateState(TIME_TOTAL, UnDefType.UNDEF);
updateState(END_TIME, UnDefType.UNDEF);
updateState(PROGRESS, UnDefType.UNDEF);
}
if (thingTypeUID.equals(THING_TYPE_ROKU_TV) && tvActive) {

View File

@ -80,6 +80,8 @@ channel-type.roku.channelName.label = Channel Name
channel-type.roku.channelName.description = The Name of the Channel Currently Selected
channel-type.roku.control.label = Control
channel-type.roku.control.description = Control playback e.g. Play/Pause/Next/Previous
channel-type.roku.endTime.label = End Time
channel-type.roku.endTime.description = The date/time when the currently playing media will end
channel-type.roku.playMode.label = Play Mode
channel-type.roku.playMode.description = The Current Playback Mode
channel-type.roku.powerState.label = Power State
@ -93,6 +95,8 @@ channel-type.roku.programRating.label = Program Rating
channel-type.roku.programRating.description = The TV Parental Guideline Rating of the Current TV Program
channel-type.roku.programTitle.label = Program Title
channel-type.roku.programTitle.description = The Name of the Current TV Program
channel-type.roku.progress.label = Media Progress
channel-type.roku.progress.description = The current progress of playing media
channel-type.roku.signalMode.label = Signal Mode
channel-type.roku.signalMode.description = The Signal Type of the Current TV Channel, ie: 1080i
channel-type.roku.signalQuality.label = Signal Quality

View File

@ -19,6 +19,8 @@
<channel id="playMode" typeId="playMode"/>
<channel id="timeElapsed" typeId="timeElapsed"/>
<channel id="timeTotal" typeId="timeTotal"/>
<channel id="endTime" typeId="endTime"/>
<channel id="progress" typeId="progress"/>
</channels>
<properties>
@ -28,7 +30,7 @@
<property name="Serial Number">unknown</property>
<property name="Device Id">unknown</property>
<property name="Software Version">unknown</property>
<property name="thingTypeVersion">1</property>
<property name="thingTypeVersion">2</property>
</properties>
<representation-property>uuid</representation-property>
@ -52,6 +54,8 @@
<channel id="playMode" typeId="playMode"/>
<channel id="timeElapsed" typeId="timeElapsed"/>
<channel id="timeTotal" typeId="timeTotal"/>
<channel id="endTime" typeId="endTime"/>
<channel id="progress" typeId="progress"/>
<channel id="activeChannel" typeId="activeChannel"/>
<channel id="signalMode" typeId="signalMode"/>
<channel id="signalQuality" typeId="signalQuality"/>
@ -69,7 +73,7 @@
<property name="Serial Number">unknown</property>
<property name="Device Id">unknown</property>
<property name="Software Version">unknown</property>
<property name="thingTypeVersion">1</property>
<property name="thingTypeVersion">2</property>
</properties>
<representation-property>uuid</representation-property>
@ -185,6 +189,24 @@
<state readOnly="true" pattern="%d %unit%"/>
</channel-type>
<channel-type id="endTime">
<item-type>DateTime</item-type>
<label>End Time</label>
<description>The date/time when the currently playing media will end</description>
<category>Time</category>
<tags>
<tag>Status</tag>
<tag>Timestamp</tag>
</tags>
<state readOnly="true"/>
</channel-type>
<channel-type id="progress">
<item-type>Dimmer</item-type>
<label>Media Progress</label>
<description>The current progress of playing media</description>
</channel-type>
<channel-type id="activeChannel">
<item-type>String</item-type>
<label>Active Channel</label>

View File

@ -12,6 +12,15 @@
<type>roku:control</type>
</add-channel>
</instruction-set>
<instruction-set targetVersion="2">
<add-channel id="endTime">
<type>roku:endTime</type>
</add-channel>
<add-channel id="progress">
<type>roku:progress</type>
</add-channel>
</instruction-set>
</thing-type>
<thing-type uid="roku:roku_tv">
@ -29,6 +38,15 @@
<type>roku:control</type>
</add-channel>
</instruction-set>
<instruction-set targetVersion="2">
<add-channel id="endTime">
<type>roku:endTime</type>
</add-channel>
<add-channel id="progress">
<type>roku:progress</type>
</add-channel>
</instruction-set>
</thing-type>
</update:update-descriptions>

View File

@ -185,6 +185,7 @@ public class ApiPageParser extends AbstractSimpleMarkupHandler {
} else if ("durchsichtig".equals(classFlag)) { // link
this.fieldType = FieldType.IGNORE;
} else if ("bord".equals(classFlag)) { // special button style - not of our interest...
continue;
} else {
logger.debug("Unhanndled class in {}:{}:{}: '{}' ", id, line, col, classFlag);
}
@ -192,7 +193,7 @@ public class ApiPageParser extends AbstractSimpleMarkupHandler {
}
} else if (this.parserState == ParserState.DATA_ENTRY && this.fieldType == FieldType.BUTTON
&& "span".equals(elementName)) {
// ignored...
return; // ignored...
} else {
logger.debug("Unexpected OpenElement in {}:{}: {} [{}]", line, col, elementName, attributes);
}
@ -245,14 +246,14 @@ public class ApiPageParser extends AbstractSimpleMarkupHandler {
getApiPageEntry(id, line, col, shortName, description, this.buttonValue);
}
} else if (this.fieldType == FieldType.IGNORE) {
// ignore
return; // ignore
} else {
logger.debug("Unhandled setting {}:{}:{} [{}] : {}", id, line, col, this.fieldType, sb);
}
}
} else if (this.parserState == ParserState.DATA_ENTRY && this.fieldType == FieldType.BUTTON
&& "span".equals(elementName)) {
// ignored...
return;// ignored...
} else {
logger.debug("Unexpected CloseElement in {}:{}: {}", line, col, elementName);
}
@ -307,7 +308,7 @@ public class ApiPageParser extends AbstractSimpleMarkupHandler {
}
} else if (this.parserState == ParserState.INIT && ((len == 1 && buffer[offset] == '\n')
|| (len == 2 && buffer[offset] == '\r' && buffer[offset + 1] == '\n'))) {
// single newline - ignore/drop it...
return; // single newline - ignore/drop it...
} else {
String msg = new String(buffer, offset, len).replace("\n", "\\n").replace("\r", "\\r");
logger.debug("Unexpected Text {}:{}: ParserState: {} ({}) `{}`", line, col, parserState, len, msg);
@ -400,9 +401,9 @@ public class ApiPageParser extends AbstractSimpleMarkupHandler {
// failed to get unit...
if ("Imp".equals(unitStr) || "€$".contains(unitStr)) {
// special case
unitData = taCmiSchemaHandler.SPECIAL_MARKER;
unitData = TACmiSchemaHandler.SPECIAL_MARKER;
} else {
unitData = taCmiSchemaHandler.NULL_MARKER;
unitData = TACmiSchemaHandler.NULL_MARKER;
logger.warn(
"Unhandled UoM '{}' - seen on channel {} '{}'; Message from QuantityType: {}",
valParts[1], shortName, description, iae.getMessage());
@ -410,12 +411,12 @@ public class ApiPageParser extends AbstractSimpleMarkupHandler {
}
taCmiSchemaHandler.unitsCache.put(unitStr, unitData);
}
if (unitData == taCmiSchemaHandler.NULL_MARKER) {
if (unitData == TACmiSchemaHandler.NULL_MARKER) {
// no UoM mappable - just send value
channelType = "Number";
unit = null;
state = new DecimalType(bd);
} else if (unitData == taCmiSchemaHandler.SPECIAL_MARKER) {
} else if (unitData == TACmiSchemaHandler.SPECIAL_MARKER) {
// special handling for unknown UoM
if ("Imp".equals(unitStr)) { // Number of Pulses
// impulses - no idea how to map this to something useful here?

View File

@ -102,7 +102,7 @@ public class ChangerX2Parser extends AbstractSimpleMarkupHandler {
this.optionFieldName = attributes == null ? null : attributes.get("name");
} else if ((this.parserState == ParserState.INIT || this.parserState == ParserState.INPUT)
&& "br".equals(elementName)) {
// ignored
return; // ignored
} else if ((this.parserState == ParserState.INIT || this.parserState == ParserState.INPUT)
&& "input".equals(elementName) && "changeto".equals(id)) {
this.parserState = ParserState.INPUT_DATA;
@ -171,7 +171,6 @@ public class ChangerX2Parser extends AbstractSimpleMarkupHandler {
}
this.options.put(ChangerX2Entry.TIME_PERIOD_PARTS, timeParts);
} else {
logger.warn("Error parsing options for {}: Unhandled input field in {}:{}: {}", channelName, line,
col, attributes);
}
@ -218,7 +217,7 @@ public class ChangerX2Parser extends AbstractSimpleMarkupHandler {
}
}
} else if (this.parserState == ParserState.INPUT && "span".equals(elementName)) {
// span's are ignored...
return; // span's are ignored...
} else {
logger.debug("Error parsing options for {}: Unexpected CloseElement in {}:{}: {}", channelName, line, col,
elementName);
@ -275,10 +274,11 @@ public class ChangerX2Parser extends AbstractSimpleMarkupHandler {
sb.append(buffer, offset, len);
}
} else if (this.parserState == ParserState.INIT && len == 1 && buffer[offset] == '\n') {
// single newline - ignore/drop it...
return; // single newline - ignore/drop it...
} else if (this.parserState == ParserState.INPUT) {
// this is a label next to the value input field - we currently have no use for it so
// it's dropped...
return;
} else {
logger.debug("Error parsing options for {}: Unexpected Text {}:{}: (ctx: {} len: {}) '{}' ",
this.channelName, line, col, this.parserState, len, new String(buffer, offset, len));

View File

@ -90,9 +90,9 @@ public class TACmiSchemaHandler extends BaseThingHandler {
// this is the units lookup cache.
protected final Map<String, UnitAndType> unitsCache = new ConcurrentHashMap<>();
// marks an entry with known un-resolveable unit
protected final UnitAndType NULL_MARKER = new UnitAndType(Units.ONE, "");
protected static final UnitAndType NULL_MARKER = new UnitAndType(Units.ONE, "");
// marks an entry with special handling - i.e. 'Imp'
protected final UnitAndType SPECIAL_MARKER = new UnitAndType(Units.ONE, "s");
protected static final UnitAndType SPECIAL_MARKER = new UnitAndType(Units.ONE, "s");
public TACmiSchemaHandler(final Thing thing, final HttpClient httpClient,
final TACmiChannelTypeProvider channelTypeProvider) {

View File

@ -5,6 +5,8 @@ It also uses [libfvad](https://github.com/dpirch/libfvad) for voice activity det
[Whisper.cpp](https://github.com/ggerganov/whisper.cpp) is a high-optimized lightweight c++ implementation of [whisper](https://github.com/openai/whisper) that allows to easily integrate it in different platforms and applications.
Alternatively, if you do not want to perform speech-to-text on the computer hosting openHAB, this add-on can consume an OpenAI/Whisper compatible transcription API.
Whisper enables speech recognition for multiple languages and dialects:
english, chinese, german, spanish, russian, korean, french, japanese, portuguese, turkish, polish, catalan, dutch, arabic, swedish,
@ -15,9 +17,11 @@ marathi, punjabi, sinhala, khmer, shona, yoruba, somali, afrikaans, occitan, geo
uzbek, faroese, haitian, pashto, turkmen, nynorsk, maltese, sanskrit, luxembourgish, myanmar, tibetan, tagalog, malagasy, assamese, tatar, lingala,
hausa, bashkir, javanese and sundanese.
## Supported platforms
## Local mode (offline)
This add-on uses some native binaries to work.
### Supported platforms
This add-on uses some native binaries to work when performing offline recognition.
You can find here the used [whisper.cpp Java wrapper](https://github.com/GiviMAD/whisper-jni) and [libfvad Java wrapper](https://github.com/GiviMAD/libfvad-jni).
The following platforms are supported:
@ -28,7 +32,7 @@ The following platforms are supported:
The native binaries for those platforms are included in this add-on provided with the openHAB distribution.
## CPU compatibility
### CPU compatibility
To use this binding it's recommended to use a device at least as powerful as the RaspberryPI 5 with a modern CPU.
The execution times on Raspberry PI 4 are x2, so just the tiny model can be run on under 5 seconds.
@ -40,18 +44,18 @@ You can check those flags on Windows using a program like `CPU-Z`.
If you are going to use the binding in a `arm64` host the CPU should support the flags: `fphp`.
You can check those flags on linux using the terminal with `lscpu`.
## Transcription time
### Transcription time
On a Raspberry PI 5, the approximate transcription times are:
| model | exec time |
| ---------- | --------: |
|------------|----------:|
| tiny.bin | 1.5s |
| base.bin | 3s |
| small.bin | 8.5s |
| medium.bin | 17s |
## Configuring the model
### Configuring the model
Before you can use this service you should configure your model.
@ -64,7 +68,7 @@ You should place the downloaded .bin model in '\<openHAB userdata\>/whisper/' so
Remember to check that you have enough RAM to load the model, estimated RAM consumption can be checked on the huggingface link.
## Using alternative whisper.cpp library
### Using alternative whisper.cpp library
It's possible to use your own build of the whisper.cpp shared library with this add-on.
@ -76,7 +80,7 @@ In the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) README you can fi
Note: You need to restart openHAB to reload the library.
## Grammar
### Grammar
The whisper.cpp library allows to define a grammar to alter the transcription results without fine-tuning the model.
@ -99,6 +103,14 @@ tv_channel ::= ("set ")? "tv channel to " [0-9]+
You can provide the grammar and enable its usage using the binding configuration.
## API mode
You can also use this add-on with a remote API that is compatible with the 'transcription' API from OpenAI. Online services exposing such an API may require an API key (paid services, such as OpenAI).
You can host you own compatible service elsewhere on your network, with third-party software such as faster-whisper-server.
Please note that API mode also uses libvfad for voice activity detection, and that grammar parameters are not available.
## Configuration
Use your favorite configuration UI to edit the Whisper settings:
@ -107,6 +119,7 @@ Use your favorite configuration UI to edit the Whisper settings:
General options.
- **Mode : LOCAL or API** - Choose either local computation or remote API use.
- **Model Name** - Model name. The 'ggml-' prefix and '.bin' extension are optional here but required on the filename. (ex: tiny.en -> ggml-tiny.en.bin)
- **Preload Model** - Keep whisper model loaded.
- **Single Utterance Mode** - When enabled recognition stops listening after a single utterance.
@ -139,6 +152,13 @@ Configure whisper options.
- **Initial Prompt** - Initial prompt for whisper.
- **OpenVINO Device** - Initialize OpenVINO encoder. (built-in binaries do not support OpenVINO, this has no effect)
- **Use GPU** - Enables GPU usage. (built-in binaries do not support GPU usage, this has no effect)
- **Language** - If specified, speed up recognition by avoiding auto-detection. Default to system locale.
### API Configuration
- **API key** - Optional use of an API key for online services requiring it.
- **API url** - You may use your own service and define its URL here. Default set to OpenAI transcription API.
- **API model name** - Your hosted service may have other models. Default to OpenAI only model 'whisper-1'.
### Grammar Configuration
@ -199,7 +219,9 @@ In case you would like to set up the service via a text file, create a new file
Its contents should look similar to:
```ini
org.openhab.voice.whisperstt:mode=LOCAL
org.openhab.voice.whisperstt:modelName=tiny
org.openhab.voice.whisperstt:language=en
org.openhab.voice.whisperstt:initSilenceSeconds=0.3
org.openhab.voice.whisperstt:removeSilence=true
org.openhab.voice.whisperstt:stepSeconds=0.3
@ -229,6 +251,9 @@ org.openhab.voice.whisperstt:useGPU=false
org.openhab.voice.whisperstt:useGrammar=false
org.openhab.voice.whisperstt:grammarPenalty=80.0
org.openhab.voice.whisperstt:grammarLines=
org.openhab.voice.whisperstt:apiKey=mykeyaaaa
org.openhab.voice.whisperstt:apiUrl=https://api.openai.com/v1/audio/transcriptions
org.openhab.voice.whisperstt:apiModelName=whisper-1
```
### Default Speech-to-Text Configuration

View File

@ -146,4 +146,29 @@ public class WhisperSTTConfiguration {
* Print whisper.cpp library logs as binding debug logs.
*/
public boolean enableWhisperLog;
/**
* local to use embedded whisper or openaiapi to use an external API
*/
public Mode mode = Mode.LOCAL;
/**
* If mode set to openaiapi, then use this URL
*/
public String apiUrl = "https://api.openai.com/v1/audio/transcriptions";
/**
* if mode set to openaiapi, use this api key to access apiUrl
*/
public String apiKey = "";
/**
* If specified, speed up recognition by avoiding auto-detection
*/
public String language = "";
/**
* Model name (API only)
*/
public String apiModelName = "whisper-1";
public static enum Mode {
LOCAL,
API;
}
}

View File

@ -12,12 +12,10 @@
*/
package org.openhab.voice.whisperstt.internal;
import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.SERVICE_CATEGORY;
import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.SERVICE_ID;
import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.SERVICE_NAME;
import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.SERVICE_PID;
import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.*;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
@ -32,7 +30,9 @@ import java.util.Date;
import java.util.Locale;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeoutException;
import java.util.concurrent.atomic.AtomicBoolean;
import javax.sound.sampled.AudioFileFormat;
@ -41,6 +41,13 @@ import javax.sound.sampled.AudioSystem;
import org.eclipse.jdt.annotation.NonNullByDefault;
import org.eclipse.jdt.annotation.Nullable;
import org.eclipse.jetty.client.HttpClient;
import org.eclipse.jetty.client.api.ContentResponse;
import org.eclipse.jetty.client.api.Request;
import org.eclipse.jetty.client.util.InputStreamContentProvider;
import org.eclipse.jetty.client.util.MultiPartContentProvider;
import org.eclipse.jetty.client.util.StringContentProvider;
import org.eclipse.jetty.http.HttpMethod;
import org.openhab.core.OpenHAB;
import org.openhab.core.audio.AudioFormat;
import org.openhab.core.audio.AudioStream;
@ -48,6 +55,7 @@ import org.openhab.core.audio.utils.AudioWaveUtils;
import org.openhab.core.common.ThreadPoolManager;
import org.openhab.core.config.core.ConfigurableService;
import org.openhab.core.config.core.Configuration;
import org.openhab.core.io.net.http.HttpClientFactory;
import org.openhab.core.io.rest.LocaleService;
import org.openhab.core.voice.RecognitionStartEvent;
import org.openhab.core.voice.RecognitionStopEvent;
@ -57,6 +65,7 @@ import org.openhab.core.voice.STTService;
import org.openhab.core.voice.STTServiceHandle;
import org.openhab.core.voice.SpeechRecognitionErrorEvent;
import org.openhab.core.voice.SpeechRecognitionEvent;
import org.openhab.voice.whisperstt.internal.WhisperSTTConfiguration.Mode;
import org.openhab.voice.whisperstt.internal.utils.VAD;
import org.osgi.framework.Constants;
import org.osgi.service.component.annotations.Activate;
@ -96,10 +105,13 @@ public class WhisperSTTService implements STTService {
private @Nullable WhisperContext context;
private @Nullable WhisperGrammar grammar;
private @Nullable WhisperJNI whisper;
private boolean isWhisperLibAlreadyLoaded = false;
private final HttpClientFactory httpClientFactory;
@Activate
public WhisperSTTService(@Reference LocaleService localeService) {
public WhisperSTTService(@Reference LocaleService localeService, @Reference HttpClientFactory httpClientFactory) {
this.localeService = localeService;
this.httpClientFactory = httpClientFactory;
}
@Activate
@ -108,7 +120,8 @@ public class WhisperSTTService implements STTService {
if (!Files.exists(WHISPER_FOLDER)) {
Files.createDirectory(WHISPER_FOLDER);
}
WhisperJNI.loadLibrary(getLoadOptions());
this.config = new Configuration(config).as(WhisperSTTConfiguration.class);
loadWhisperLibraryIfNeeded();
VoiceActivityDetector.loadLibrary();
whisper = new WhisperJNI();
} catch (IOException | RuntimeException e) {
@ -117,6 +130,13 @@ public class WhisperSTTService implements STTService {
configChange(config);
}
private void loadWhisperLibraryIfNeeded() throws IOException {
if (config.mode == Mode.LOCAL && !isWhisperLibAlreadyLoaded) {
WhisperJNI.loadLibrary(getLoadOptions());
isWhisperLibAlreadyLoaded = true;
}
}
private WhisperJNI.LoadOptions getLoadOptions() {
Path libFolder = Paths.get("/usr/local/lib");
Path libFolderWin = Paths.get("/Windows/System32");
@ -167,14 +187,27 @@ public class WhisperSTTService implements STTService {
private void configChange(Map<String, Object> config) {
this.config = new Configuration(config).as(WhisperSTTConfiguration.class);
WhisperJNI.setLibraryLogger(this.config.enableWhisperLog ? this::onWhisperLog : null);
WhisperGrammar grammar = this.grammar;
if (grammar != null) {
grammar.close();
this.grammar = null;
}
// API mode
if (this.config.mode == Mode.API) {
try {
unloadContext();
} catch (IOException e) {
logger.warn("IOException unloading model: {}", e.getMessage());
}
return;
}
// Local mode
WhisperJNI whisper;
try {
loadWhisperLibraryIfNeeded();
WhisperJNI.setLibraryLogger(this.config.enableWhisperLog ? this::onWhisperLog : null);
whisper = getWhisper();
} catch (IOException ignored) {
logger.warn("library not loaded, the add-on will not work");
@ -228,9 +261,17 @@ public class WhisperSTTService implements STTService {
@Override
public Set<Locale> getSupportedLocales() {
// as it is not possible to determine the language of the model that was downloaded and setup by the user, it is
// assumed the language of the model is matching the locale of the openHAB server
return Set.of(localeService.getLocale(null));
// Attempt to create a locale from the configured language
String language = config.language;
Locale modelLocale = localeService.getLocale(null);
if (!language.isBlank()) {
try {
modelLocale = Locale.forLanguageTag(language);
} catch (IllegalArgumentException e) {
logger.warn("Invalid language '{}', defaulting to server locale", language);
}
}
return Set.of(modelLocale);
}
@Override
@ -246,33 +287,18 @@ public class WhisperSTTService implements STTService {
public STTServiceHandle recognize(STTListener sttListener, AudioStream audioStream, Locale locale, Set<String> set)
throws STTException {
AtomicBoolean aborted = new AtomicBoolean(false);
WhisperContext ctx = null;
WhisperState state = null;
try {
var whisper = getWhisper();
ctx = getContext();
logger.debug("Creating whisper state...");
state = whisper.initState(ctx);
logger.debug("Whisper state created");
logger.debug("Creating VAD instance...");
final int nSamplesStep = (int) (config.stepSeconds * (float) WHISPER_SAMPLE_RATE);
final int nSamplesStep = (int) (config.stepSeconds * WHISPER_SAMPLE_RATE);
VAD vad = new VAD(VoiceActivityDetector.Mode.valueOf(config.vadMode), WHISPER_SAMPLE_RATE, nSamplesStep,
config.vadStep, config.vadSensitivity);
logger.debug("VAD instance created");
sttListener.sttEventReceived(new RecognitionStartEvent());
backgroundRecognize(whisper, ctx, state, nSamplesStep, locale, sttListener, audioStream, vad, aborted);
backgroundRecognize(nSamplesStep, locale, sttListener, audioStream, vad, aborted);
} catch (IOException e) {
if (ctx != null && !config.preloadModel) {
ctx.close();
}
if (state != null) {
state.close();
}
throw new STTException("Exception during initialization", e);
}
return () -> {
aborted.set(true);
};
return () -> aborted.set(true);
}
private WhisperJNI getWhisper() throws IOException {
@ -339,9 +365,8 @@ public class WhisperSTTService implements STTService {
}
}
private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, WhisperState state, final int nSamplesStep,
Locale locale, STTListener sttListener, AudioStream audioStream, VAD vad, AtomicBoolean aborted) {
var releaseContext = !config.preloadModel;
private void backgroundRecognize(final int nSamplesStep, Locale locale, STTListener sttListener,
AudioStream audioStream, VAD vad, AtomicBoolean aborted) {
final int nSamplesMax = config.maxSeconds * WHISPER_SAMPLE_RATE;
final int nSamplesMin = (int) (config.minSeconds * (float) WHISPER_SAMPLE_RATE);
final int nInitSilenceSamples = (int) (config.initSilenceSeconds * (float) WHISPER_SAMPLE_RATE);
@ -353,21 +378,17 @@ public class WhisperSTTService implements STTService {
logger.debug("Max silence samples {}", nMaxSilenceSamples);
// used to store the step samples in libfvad wanted format 16-bit int
final short[] stepAudioSamples = new short[nSamplesStep];
// used to store the full samples in whisper wanted format 32-bit float
final float[] audioSamples = new float[nSamplesMax];
// used to store the full retained samples for whisper
final short[] audioSamples = new short[nSamplesMax];
executor.submit(() -> {
int audioSamplesOffset = 0;
int silenceSamplesCounter = 0;
int nProcessedSamples = 0;
int numBytesRead;
boolean voiceDetected = false;
String transcription = "";
String tempTranscription = "";
VAD.@Nullable VADResult lastVADResult;
VAD.@Nullable VADResult firstConsecutiveSilenceVADResult = null;
try {
try (state; //
audioStream; //
try (audioStream; //
vad) {
if (AudioFormat.CONTAINER_WAVE.equals(audioStream.getFormat().getContainer())) {
AudioWaveUtils.removeFMT(audioStream);
@ -376,10 +397,9 @@ public class WhisperSTTService implements STTService {
.order(ByteOrder.LITTLE_ENDIAN);
// init remaining to full capacity
int remaining = captureBuffer.capacity();
WhisperFullParams params = getWhisperFullParams(ctx, locale);
while (!aborted.get()) {
// read until no remaining so we get the complete step samples
numBytesRead = audioStream.read(captureBuffer.array(), captureBuffer.capacity() - remaining,
int numBytesRead = audioStream.read(captureBuffer.array(), captureBuffer.capacity() - remaining,
remaining);
if (aborted.get() || numBytesRead == -1) {
break;
@ -395,17 +415,15 @@ public class WhisperSTTService implements STTService {
while (shortBuffer.hasRemaining()) {
var position = shortBuffer.position();
short i16BitSample = shortBuffer.get();
float f32BitSample = Float.min(1f,
Float.max((float) i16BitSample / ((float) Short.MAX_VALUE), -1f));
stepAudioSamples[position] = i16BitSample;
audioSamples[audioSamplesOffset++] = f32BitSample;
audioSamples[audioSamplesOffset++] = i16BitSample;
nProcessedSamples++;
}
// run vad
if (nProcessedSamples + nSamplesStep > nSamplesMax - nSamplesStep) {
logger.debug("VAD: Skipping, max length reached");
} else {
lastVADResult = vad.analyze(stepAudioSamples);
VAD.@Nullable VADResult lastVADResult = vad.analyze(stepAudioSamples);
if (lastVADResult.isVoice()) {
voiceDetected = true;
logger.debug("VAD: voice detected");
@ -484,43 +502,26 @@ public class WhisperSTTService implements STTService {
}
}
}
// run whisper
logger.debug("running whisper with {} seconds of audio...",
Math.round((((float) audioSamplesOffset) / (float) WHISPER_SAMPLE_RATE) * 100f) / 100f);
long execStartTime = System.currentTimeMillis();
var result = whisper.fullWithState(ctx, state, params, audioSamples, audioSamplesOffset);
logger.debug("whisper ended in {}ms with result code {}",
System.currentTimeMillis() - execStartTime, result);
// process result
if (result != 0) {
emitSpeechRecognitionError(sttListener);
break;
}
int nSegments = whisper.fullNSegmentsFromState(state);
logger.debug("Available transcription segments {}", nSegments);
if (nSegments == 1) {
tempTranscription = whisper.fullGetSegmentTextFromState(state, 0);
// run whisper, either locally or by remote API
String tempTranscription = (switch (config.mode) {
case LOCAL -> recognizeLocal(audioSamplesOffset, audioSamples, locale.getLanguage());
case API -> recognizeAPI(audioSamplesOffset, audioSamples, locale.getLanguage());
});
if (tempTranscription != null && !tempTranscription.isBlank()) {
if (config.createWAVRecord) {
createAudioFile(audioSamples, audioSamplesOffset, tempTranscription,
locale.getLanguage());
}
transcription += tempTranscription;
if (config.singleUtteranceMode) {
logger.debug("single utterance mode, ending transcription");
transcription = tempTranscription;
break;
} else {
// start a new transcription segment
transcription += tempTranscription;
tempTranscription = "";
}
} else if (nSegments == 0 && config.singleUtteranceMode) {
logger.debug("Single utterance mode and no results, ending transcription");
break;
} else if (nSegments > 1) {
// non reachable
logger.warn("Whisper should be configured in single segment mode {}", nSegments);
} else {
break;
}
// reset state to start with next segment
voiceDetected = false;
silenceSamplesCounter = 0;
@ -528,10 +529,6 @@ public class WhisperSTTService implements STTService {
logger.debug("Partial transcription: {}", tempTranscription);
logger.debug("Transcription: {}", transcription);
}
} finally {
if (releaseContext) {
ctx.close();
}
}
// emit result
if (!aborted.get()) {
@ -543,7 +540,7 @@ public class WhisperSTTService implements STTService {
emitSpeechRecognitionNoResultsError(sttListener);
}
}
} catch (IOException e) {
} catch (STTException | IOException e) {
logger.warn("Error running speech to text: {}", e.getMessage());
emitSpeechRecognitionError(sttListener);
} catch (UnsatisfiedLinkError e) {
@ -553,7 +550,119 @@ public class WhisperSTTService implements STTService {
});
}
private WhisperFullParams getWhisperFullParams(WhisperContext context, Locale locale) throws IOException {
@Nullable
private String recognizeLocal(int audioSamplesOffset, short[] audioSamples, String language) throws STTException {
logger.debug("running whisper with {} seconds of audio...",
Math.round((((float) audioSamplesOffset) / (float) WHISPER_SAMPLE_RATE) * 100f) / 100f);
var releaseContext = !config.preloadModel;
WhisperJNI whisper = null;
WhisperContext ctx = null;
WhisperState state = null;
try {
whisper = getWhisper();
ctx = getContext();
logger.debug("Creating whisper state...");
state = whisper.initState(ctx);
logger.debug("Whisper state created");
WhisperFullParams params = getWhisperFullParams(ctx, language);
// convert to local whisper format (float)
float[] floatArray = new float[audioSamples.length];
for (int i = 0; i < audioSamples.length; i++) {
floatArray[i] = Float.min(1f, Float.max((float) audioSamples[i] / ((float) Short.MAX_VALUE), -1f));
}
long execStartTime = System.currentTimeMillis();
var result = whisper.fullWithState(ctx, state, params, floatArray, audioSamplesOffset);
logger.debug("whisper ended in {}ms with result code {}", System.currentTimeMillis() - execStartTime,
result);
// process result
if (result != 0) {
throw new STTException("Cannot use whisper locally, result code: " + result);
}
int nSegments = whisper.fullNSegmentsFromState(state);
logger.debug("Available transcription segments {}", nSegments);
if (nSegments == 1) {
return whisper.fullGetSegmentTextFromState(state, 0);
} else if (nSegments == 0 && config.singleUtteranceMode) {
logger.debug("Single utterance mode and no results, ending transcription");
return null;
} else {
// non reachable
logger.warn("Whisper should be configured in single segment mode {}", nSegments);
return null;
}
} catch (IOException e) {
if (state != null) {
state.close();
}
throw new STTException("Cannot use whisper locally", e);
} finally {
if (releaseContext && ctx != null) {
ctx.close();
}
}
}
private String recognizeAPI(int audioSamplesOffset, short[] audioStream, String language) throws STTException {
// convert to byte array, Each short has 2 bytes
int size = audioSamplesOffset * 2;
ByteBuffer byteArrayBuffer = ByteBuffer.allocate(size).order(ByteOrder.LITTLE_ENDIAN);
for (int i = 0; i < audioSamplesOffset; i++) {
byteArrayBuffer.putShort(audioStream[i]);
}
javax.sound.sampled.AudioFormat jAudioFormat = new javax.sound.sampled.AudioFormat(
javax.sound.sampled.AudioFormat.Encoding.PCM_SIGNED, WHISPER_SAMPLE_RATE, 16, 1, 2, WHISPER_SAMPLE_RATE,
false);
byte[] byteArray = byteArrayBuffer.array();
try {
AudioInputStream audioInputStream = new AudioInputStream(new ByteArrayInputStream(byteArray), jAudioFormat,
audioSamplesOffset);
// write stream as a WAV file, in a byte array stream :
ByteArrayInputStream byteArrayInputStream = null;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, baos);
byteArrayInputStream = new ByteArrayInputStream(baos.toByteArray());
}
// prepare HTTP request
HttpClient commonHttpClient = httpClientFactory.getCommonHttpClient();
MultiPartContentProvider multiPartContentProvider = new MultiPartContentProvider();
multiPartContentProvider.addFilePart("file", "audio.wav",
new InputStreamContentProvider(byteArrayInputStream), null);
multiPartContentProvider.addFieldPart("model", new StringContentProvider(this.config.apiModelName), null);
multiPartContentProvider.addFieldPart("response_format", new StringContentProvider("text"), null);
multiPartContentProvider.addFieldPart("temperature",
new StringContentProvider(Float.toString(this.config.temperature)), null);
if (!language.isBlank()) {
multiPartContentProvider.addFieldPart("language", new StringContentProvider(language), null);
}
Request request = commonHttpClient.newRequest(config.apiUrl).method(HttpMethod.POST)
.content(multiPartContentProvider);
if (!config.apiKey.isBlank()) {
request = request.header("Authorization", "Bearer " + config.apiKey);
}
// execute the request
ContentResponse response = request.send();
// check the HTTP status code from the response
int statusCode = response.getStatus();
if (statusCode < 200 || statusCode >= 300) {
logger.debug("HTTP error: Received status code {}, full error is {}", statusCode,
response.getContentAsString());
throw new STTException("Failed to retrieve transcription: HTTP status code " + statusCode);
}
return response.getContentAsString();
} catch (InterruptedException | TimeoutException | ExecutionException | IOException e) {
throw new STTException("Exception during attempt to get speech recognition result from api", e);
}
}
private WhisperFullParams getWhisperFullParams(WhisperContext context, String language) throws IOException {
WhisperSamplingStrategy strategy = WhisperSamplingStrategy.valueOf(config.samplingStrategy);
var params = new WhisperFullParams(strategy);
params.temperature = config.temperature;
@ -570,7 +679,7 @@ public class WhisperSTTService implements STTService {
params.grammarPenalty = config.grammarPenalty;
}
// there is no single language models other than the english ones
params.language = getWhisper().isMultilingual(context) ? locale.getLanguage() : "en";
params.language = getWhisper().isMultilingual(context) ? language : "en";
// implementation assumes this options
params.translate = false;
params.detectLanguage = false;
@ -605,7 +714,7 @@ public class WhisperSTTService implements STTService {
}
}
private void createAudioFile(float[] samples, int size, String transcription, String language) {
private void createAudioFile(short[] samples, int size, String transcription, String language) {
createSamplesDir();
javax.sound.sampled.AudioFormat jAudioFormat;
ByteBuffer byteBuffer;
@ -615,7 +724,7 @@ public class WhisperSTTService implements STTService {
WHISPER_SAMPLE_RATE, 16, 1, 2, WHISPER_SAMPLE_RATE, false);
byteBuffer = ByteBuffer.allocate(size * 2).order(ByteOrder.LITTLE_ENDIAN);
for (int i = 0; i < size; i++) {
byteBuffer.putShort((short) (samples[i] * (float) Short.MAX_VALUE));
byteBuffer.putShort(samples[i]);
}
} else {
logger.debug("Saving audio file with sample format f32");
@ -623,7 +732,7 @@ public class WhisperSTTService implements STTService {
WHISPER_SAMPLE_RATE, 32, 1, 4, WHISPER_SAMPLE_RATE, false);
byteBuffer = ByteBuffer.allocate(size * 4).order(ByteOrder.LITTLE_ENDIAN);
for (int i = 0; i < size; i++) {
byteBuffer.putFloat(samples[i]);
byteBuffer.putFloat(Float.min(1f, Float.max((float) samples[i] / ((float) Short.MAX_VALUE), -1f)));
}
}
AudioInputStream audioInputStreamTemp = new AudioInputStream(new ByteArrayInputStream(byteBuffer.array()),

View File

@ -11,7 +11,7 @@
</parameter-group>
<parameter-group name="vad">
<label>Voice Activity Detection</label>
<description>Configure the VAD mechanisim used to isolate single phrases to feed whisper with.</description>
<description>Configure the VAD mechanism used to isolate single phrases to feed whisper with.</description>
</parameter-group>
<parameter-group name="whisper">
<label>Whisper Options</label>
@ -19,7 +19,7 @@
</parameter-group>
<parameter-group name="grammar">
<label>Grammar</label>
<description>Define a grammar to improve transcrptions.</description>
<description>Define a grammar to improve transcriptions.</description>
</parameter-group>
<parameter-group name="messages">
<label>Info Messages</label>
@ -30,9 +30,27 @@
<description>Options added for developers.</description>
<advanced>true</advanced>
</parameter-group>
<parameter-group name="openaiapi">
<label>API Configuration Options</label>
<description>Configure OpenAI compatible API, if you don't want to use the local model.</description>
</parameter-group>
<parameter name="mode" type="text" groupName="stt">
<label>Local Mode Or API</label>
<description>Use the local model or the OpenAI compatible API.</description>
<default>LOCAL</default>
<options>
<option value="LOCAL">Local</option>
<option value="API">OpenAI API</option>
</options>
</parameter>
<parameter name="modelName" type="text" groupName="stt" required="true">
<label>Model Name</label>
<description>Model name without extension.</description>
<label>Local Model Name</label>
<description>Model name without extension. Local mode only.</description>
</parameter>
<parameter name="language" type="text" groupName="whisper">
<label>Language</label>
<description>If specified, speed up recognition by avoiding auto-detection. Default to system locale.</description>
<default></default>
</parameter>
<parameter name="preloadModel" type="boolean" groupName="stt">
<label>Preload Model</label>
@ -225,5 +243,20 @@
<default>false</default>
<advanced>true</advanced>
</parameter>
<parameter name="apiKey" type="text" groupName="openaiapi">
<label>API Key</label>
<description>Key to access the API</description>
<default></default>
</parameter>
<parameter name="apiUrl" type="text" groupName="openaiapi">
<label>API Url</label>
<description>OpenAI compatible API URL. Default to OpenAI transcription service.</description>
<default>https://api.openai.com/v1/audio/transcriptions</default>
</parameter>
<parameter name="apiModelName" type="text" groupName="openaiapi">
<label>API Model</label>
<description>Model name to use (API only). Default to OpenAI only available model (whisper-1).</description>
<default>whisper-1</default>
</parameter>
</config-description>
</config-description:config-descriptions>

View File

@ -3,6 +3,12 @@
addon.whisperstt.name = Whisper Speech-to-Text
addon.whisperstt.description = Whisper STT Service uses the whisper.cpp library to transcript audio data to text.
voice.config.whisperstt.apiKey.label = API Key
voice.config.whisperstt.apiKey.description = Key to access the API
voice.config.whisperstt.apiModelName.label = API Model
voice.config.whisperstt.apiModelName.description = Model name to use (API only). Default to OpenAI only available model (whisper-1).
voice.config.whisperstt.apiUrl.label = API Url
voice.config.whisperstt.apiUrl.description = OpenAI compatible API URL. Default to OpenAI transcription service.
voice.config.whisperstt.audioContext.label = Audio Context
voice.config.whisperstt.audioContext.description = Overwrite the audio context size. (0 to use whisper default context size)
voice.config.whisperstt.beamSize.label = Beam Size
@ -24,27 +30,35 @@ voice.config.whisperstt.greedyBestOf.description = Best Of configuration for sam
voice.config.whisperstt.group.developer.label = Developer
voice.config.whisperstt.group.developer.description = Options added for developers.
voice.config.whisperstt.group.grammar.label = Grammar
voice.config.whisperstt.group.grammar.description = Define a grammar to improve transcrptions.
voice.config.whisperstt.group.grammar.description = Define a grammar to improve transcriptions.
voice.config.whisperstt.group.messages.label = Info Messages
voice.config.whisperstt.group.messages.description = Configure service information messages.
voice.config.whisperstt.group.openaiapi.label = API Configuration Options
voice.config.whisperstt.group.openaiapi.description = Configure OpenAI compatible API, if you don't want to use the local model.
voice.config.whisperstt.group.stt.label = STT Configuration
voice.config.whisperstt.group.stt.description = Configure Speech to Text.
voice.config.whisperstt.group.vad.label = Voice Activity Detection
voice.config.whisperstt.group.vad.description = Configure the VAD mechanisim used to isolate single phrases to feed whisper with.
voice.config.whisperstt.group.vad.description = Configure the VAD mechanism used to isolate single phrases to feed whisper with.
voice.config.whisperstt.group.whisper.label = Whisper Options
voice.config.whisperstt.group.whisper.description = Configure the whisper.cpp transcription options.
voice.config.whisperstt.initSilenceSeconds.label = Initial Silence Seconds
voice.config.whisperstt.initSilenceSeconds.description = Max initial seconds of silence to discard transcription.
voice.config.whisperstt.initialPrompt.label = Initial Prompt
voice.config.whisperstt.initialPrompt.description = Initial prompt to feed whisper with.
voice.config.whisperstt.language.label = Language
voice.config.whisperstt.language.description = If specified, speed up recognition by avoiding auto-detection. Default to system locale.
voice.config.whisperstt.maxSeconds.label = Max Transcription Seconds
voice.config.whisperstt.maxSeconds.description = Seconds to force transcription before silence detection.
voice.config.whisperstt.maxSilenceSeconds.label = Max Silence Seconds
voice.config.whisperstt.maxSilenceSeconds.description = Seconds of silence to trigger transcription.
voice.config.whisperstt.minSeconds.label = Min Transcription Seconds
voice.config.whisperstt.minSeconds.description = Min transcription seconds passed to whisper.
voice.config.whisperstt.modelName.label = Model Name
voice.config.whisperstt.modelName.description = Model name without extension.
voice.config.whisperstt.mode.label = Local Mode Or API
voice.config.whisperstt.mode.description = Use the local model or the OpenAI compatible API.
voice.config.whisperstt.mode.option.LOCAL = Local
voice.config.whisperstt.mode.option.API = OpenAI API
voice.config.whisperstt.modelName.label = Local Model Name
voice.config.whisperstt.modelName.description = Model name without extension. Local mode only.
voice.config.whisperstt.openvinoDevice.label = OpenVINO Device
voice.config.whisperstt.openvinoDevice.description = Initialize OpenVINO encoder. (built-in binaries do not support OpenVINO, this has no effect)
voice.config.whisperstt.preloadModel.label = Preload Model

View File

@ -165,6 +165,7 @@ public class HomieImplementationTest extends MqttOSGiTest {
"Connection " + homieConnection.getClientId() + " not retrieving all topics ");
}
@Disabled("https://github.com/openhab/openhab-addons/issues/12667")
@Test
public void retrieveOneAttribute() throws Exception {
WaitForTopicValue watcher = new WaitForTopicValue(homieConnection, DEVICE_TOPIC + "/$homie");

View File

@ -107,6 +107,7 @@ public class WemoMakerHandlerOSGiTest extends GenericWemoOSGiTest {
}
@Test
@Disabled("https://github.com/openhab/openhab-addons/issues/12474")
public void assertThatThingHandlesREFRESHCommand()
throws MalformedURLException, URISyntaxException, ValidationException, IOException {
Command command = RefreshType.REFRESH;