arupaka-_-arupakaの日記

transformersで日本語感情分析

機械学習自然言語処理

はじめに

daigo/bert-base-japanese-sentiment が使えなかったので、簡単にやる方法を探した。

簡単！事前学習済モデルを利用したテキストデータのネガポジ分析 - Qiita　を参考に、koheiduck/bert-japanese-finetuned-sentimen を使わさせていただいた。 google colab で行った。

ライブラリのインストール

!pip install transformers
!pip install transformers['ja']

!pip install sentencepiece
!pip install ipadic

感情分析

from transformers import pipeline, AutoModelForSequenceClassification, BertJapaneseTokenizer,BertTokenizer, BertForSequenceClassification
# パイプラインの準備
model = AutoModelForSequenceClassification.from_pretrained('koheiduck/bert-japanese-finetuned-sentiment') 
tokenizer = BertJapaneseTokenizer.from_pretrained('cl-tohoku/bert-base-japanese-whole-word-masking')
classifier = pipeline("sentiment-analysis",model=model,tokenizer=tokenizer)
result = classifier("悪いです")[0]
print(f"あ：label: {result['label']}, with score: {round(result['score'], 4)}")

result = classifier("良いです")[0]
print(f"い：label: {result['label']}, with score: {round(result['score'], 4)}")

出力結果

Downloading (…)lve/main/config.json: 100%
924/924 [00:00<00:00, 23.1kB/s]
Downloading pytorch_model.bin: 100%
443M/443M [00:05<00:00, 81.6MB/s]

あ：label: NEGATIVE, with score: 0.9924
い：label: POSITIVE, with score: 0.9929

その他
ファインチューニングとかはこれが良さげ。 Hugging Face transformers を使って日本語 BERT モデルをファインチューニングして感情分析 (with google colab) part01 — ハンズオン資料