Whisper

2025/01/12 Python

目的

Bilibili動画に日本語の字幕をつけたい

インストール

pip install gTTS

pip install playsound

pip install openai-whisper

(ffmpegをダウンロードし環境変数の設定）

pip install –upgrade deepl

pip install srt

pip install pysrt

pip install torch

インポート

from yt_dlp import YoutubeDL

from gtts import gTTS

from playsound import playsound

import datetime

import whisper

import deepl

import torch

import os

import srt

import pysrt

変数

url_bilibili = ‘https://www.bilibili.com/video/BV1AF411X77S/?spm_id_from=333.337.search-card.all.click’

filename = ‘C:/Users/wiki1/Desktop/xxx’

動画抽出

with YoutubeDL() as yd:

result = yd.download([url_bilibili])

エラー

ffmpegフォルダがデスクトップにないとエラーになる
https://www.bilibili.com/video/BV1Rj411D7TB/?spm_id_from=333.788.recommend_more_video.9&vd_source=6c2eadb370007a4ba80a31566fe501ccのように&vd_source以下が入っているとエラーが起きるので、Bilibili動画をDLする際は、&vd_source以下を削除してからプログラムを実行すること

音声抽出

option = {

‘format’: ‘bestaudio/best’,

‘outtmpl’: path_audio + ‘.%(ext)s’,

‘postprocessors’: [

{‘key’: ‘FFmpegExtractAudio’, ‘preferredcodec’: ‘mp3’, ‘preferredquality’: ‘192’},

{‘key’: ‘FFmpegMetadata’},

}

ydl = YoutubeDL(option).download([url_bilibili])

音声認識

model = whisper.load_model(‘medium’)

audio = whisper.load_audio(filename + ‘.mp3’)

result = model.transcribe(audio, verbose=True, language=’en’, fp16=False)

注意

language=’’をかかなければ自動で言語を抽出してくれるBilibiliは中国語なのでlanguage=’’は不要
fp16=Falseをつけないと次のWarningが出る「FP16 is not supported on CPU; using FP32 instead」

resultの中身

{‘language’: ‘en’, ‘segments’: [{ ‘id’: len(all_segments), ‘seek’: seek, ‘start’: start, #開始時間 ‘end’: end, #終了時間 ‘text’: text, ‘tokens’: result.temperature, ‘avg_logprob’: result.avg_logprob, ‘compression_ratio’: result.compression_ratio, ‘no_speech_prob’: result.no_speech_prob, }], ‘text’: ‘*****’}

テキストの取り出し

subs = []

for data in result[‘segments’]:

index = data[‘id’] + 1

start = data[‘start’]

end = data[‘end’]

text = data[‘text’]

sub = srt.Subtitle(

index = 1,

start = datetime.timedelta(

seconds = timedelta(seconds=start).seconds,

microseconds = timedelta(seconds=start).microseconds

end = datetime.timedelta(

seconds = timedelta(seconds=end).seconds,

microseconds = timedelta(seconds=end).microseconds

content = text,

proprietary = ”

)

subs.append(sub)

SRT・TXTファイルの作成（中国語）

with open(filename + “.srt”, mode=”w”, encoding=”utf-8″) as f:

f.write(srt.compose(subs))

subrip = pysrt.open(filename + “.srt”)

f_out = open(filename + “.txt”, mode=”w”, encoding=”utf-8″)

for sub in subrip:

f_out.write(sub.index)

f_out.write(str(sub.start) + ‘–>’ + str(sub.end))

f_out.write(sub.text + ‘\n’)

DeepL翻訳

time.sleep(3)

translator = deepl.Translator(’46e84157-5483-4190-7a2c-463ed2cc3ea7:fx’)

with open(‘C:/Users/wiki1/Desktop/japan.txt’, mode=’w’) as fw:

with open(‘C:/Users/wiki1/Desktop/china.txt’, encoding=’utf-8′) as f:

for line in f:

honyaku = translator.translate_text(line, target_lang=’JA’)

print(honyaku)

fw.write(honyaku.text + ‘\n’)

エラー

encoding=’utf-8’がないと次のエラーが出るunicodeDecodeError: ‘cp932’ codec can’t decode byte 0xef in position 0
cp932は日本語の文字コードであり、Windowsは勝手にutf-8をcp932に変換しようとするしかし変換できないため、UnicodeDecodeErrorが起きる

※APIキーの取得が必要：https://www.deepl.com/ja/account/summary※無料では50万文字まで※認証キーが書いてないとraise ValueError(“auth_key must not be empty”)

Google翻訳

Google翻訳は翻訳精度が悪すぎるため断念。また、翻訳する文字数が5000を超えると分割する必要があり、分割しないと次のエラーが発生するTypeError: the JSON object must be str, bytes or bytearray, not NoneType

with openで複数ファイルを開く方法・TextResultをStrに変換する方法

DeepL翻訳の結果のTextResultをstrとて受け取る方法https://qiita.com/komeda_coffee_24_sotu/items/a10e4ee34087a5104d2a https://qiita.com/kenta1984/items/f3c0904d5d0d2ecb8ca6

SRT・TxTファイルの作成（日本語）

with open(“C:/Users/wiki1/Desktop/japan.txt”, encoding=”utf-8″) as f:

dataline = f.read()

dataline = dataline.replace(“–>”,” –> “)

dataline = dataline.replace(“0\n\n”,”0\n”)

dataline = dataline.replace(“\n\n0″,”\n0”)

with open(“C:/Users/wiki1/Desktop/japan.txt”, mode=”w”, encoding=”utf-8″) as f:

f.write(dataline)

MP4にSRTをくっつける

AviUtl

SRTをEXOファイルに変換してからAviUtlに読み込ませるEXOファイルのつくりかた↓https://aketama.work/aviutl-auto-subtitling#:~:text=AviUtl%E3%81%A7%E3%81%AF.srt%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB%E3%82%92,exo2srt_srt2exo%E3%80%8F%E3%82%92%E4%BD%BF%E7%94%A8%E3%81%97%E3%81%BE%E3%81%99%E3%80%82やってみたけど、タイムライン上にEXOファイルが表示されなかったりして使いにくい

PowerDirector

SRTをそのままmp4に載せられるが、有料

pip install moviepy

pip install –upgrade moviepy

from moviepy import editor

from moviepy.video.tools.subtitles import SubtitlesClip

from moviepy.video.io.VideoFileClip import VideoFileClip

from moviepy.config import change_settings

#magick.exeの場所を指定

change_settings({“IMAGEMAGICK_BINARY”: “C:/Program Files/ImageMagick-7.1.1-Q16-HDRI/magick.exe”})

#変数

path_video = ‘C:/Users/wiki1/Desktop/xxx.mp4’

path_srt = ‘C:/Users/wiki1/Desktop/output.srt’

path_complete = ‘C:/Users/wiki1/Desktop/complete.mp4’

#ファイルの読込

#デフォルトは英語のみサポートなので、日本語フォントのttfファイルをDLしないと字幕が映らない

video = VideoFileClip(path_video)

generator = lambda txt: editor.TextClip(txt, font=’C:/Users/wiki1/Desktop/gomarice_mukasi_mukasi.ttf’, fontsize=35, color=’white’)

subs = SubtitlesClip(path_srt, generator)

#字幕を動画に埋め込む

subs = subs.set_position((‘center’,’center’))

#動画に字幕を重ねる

subsvideo = editor.CompositeVideoClip([video,subs])

#動画をリサイズする

subsvideo = subsvideo.resize(height=video.h)

#動画上下中央に配置する

subsvideo = subsvideo.set_position((‘center’,’center’))

#mp4として保存する

subsvideo.write_videofile(path_complete)

This error can be due to the fact that ImageMagick is not installed on your computer, or (for Windows users) that you didn’t specify the path to the ImageMagick binary in file conf.py, or that the path you specified is incorrect

上記エラーが出た場合は下記URLを参考にしてImageMagickをインストールすることhttps://www.ipentec.com/document/software-install-imagemagick

エラー

with openする際にmode=’w’をつけると以下のエラーが起きることがある

io.UnsupportedOperation: not readable

wは書き込み専用であってファイルを読み込むことはできないので起きたエラーであり、mode=’W’を消せば解決する

https://stackoverflow.com/questions/44901806/python-error-message-io-unsupportedoperation-not-readable

Pythonでtxtファイルを書き換える方法

Pythonで既存のテキストファイルを書き換える方法

https://jimaru.blog/programming/python/replace-file-content

#翻訳結果をtxtファイルに保存

f = open(path3, ‘w’)

f.write(text_jp)

f.close()

#音声合成

gTTS(‘終わりました’, lang=’ja’).save(path2)

playsound(path2)

データセット	容量	必要なVRAM
tiny	under 400MB	~1GB
base	under 400MB	~1GB
small	461MB	~2GB
medium	1.42GB	~5GB
large	2.87GB	10GB

番外編：要約

pip install pysummarization

pip install mecab-python3

pip install unidic-lite

pip install nltk

from pysummarization.nlpbase.auto_abstractor import AutoAbstractor

from pysummarization.tokenizabledoc.mecab_tokenizer import MeCabTokenizer

from pysummarization.abstractabledoc.top_n_rank_abstractor import TopNRankAbstractor

text = ‘長いテキストです。\

長いテキストです。\

長いテキストです。’

aa = AutoAbstractor()

aa.tokenizable_doc = MeCabTokenizer()

aa.delimiter_list = [‘。’,’\n’]

tnra = TopNRankAbstractor()

result = aa.summarize(text,tnra)

for x in result[‘summarize_result’]:

print(x)

for x in result[‘scoring_data’]:

print(x)

※

BACK