svm_lightのダウンロードと簡単な使い方の例

svm_lightの使用

参考ページ
http://www.kazamiya.net/svm/svm-light_install.html

ソース
http://download.joachims.org/svm_light/current/svm_light.tar.gz
コンパイル

501 mkdir svm-light
502 cd svm-light
500 wget http://download.joachims.org/svm_light/current/svm_light.tar.gz
509 gzip -d svm_lig*
513 tar -xvf svm_light.tar
516 make

使い方：
http://www.eml.ele.cst.nihon-u.ac.jp/~momma/wiki/wiki.cgi/SupportVectorMachine/SVMlight.html
を参考にする．

　学習用プログラムsvm_learnと分類プログラム svm_classify.

svm_learn [options] data_file model_file　
svm_classify [options] data_file2 model_file output_file

学習のファイルの例（参照先のページを参考にさせていただきました）：
data_file
: : ... : #

-1 1:0.43 3:0.12 9284:0.2 # abcdef

abcdefに関するデータで、ネガティブ解で特徴量1が0.43、特徴量3が0.12、特徴量9284が0.2、他の特徴量2,4〜9283(仮に9284が特数徴とすると)は0となる.

例１ data_file　
$ cat my_example.txt　（特徴量２が負の数なら1，正の数なら-1）

1 1:1 2:-1
1 1:2 2:-1
1 1:3 2:-3
1 1:4 2:-4
1 1:5 2:-5
1 1:6 2:-2
1 1:7 2:-3
-1 1:8 2:1
-1 1:9 2:1
-1 1:10 2:1
-1 1:11 2:3
-1 1:4 2:4
-1 1:5 2:1
-1 1:6 2:6
-1 1:7 2:30
-1 1:8 2:1

学習

 ./svm_learn.exe my_example.txt out_model.dat

こんなのが出力ファイル
$ cat out_model.dat

SVM-light Version V6.02
0 # kernel type
3 # kernel parameter -d
1 # kernel parameter -g
1 # kernel parameter -s
1 # kernel parameter -r
empty# kernel parameter -u
2 # highest feature index
16 # number of training documents
11 # number of support vectors plus 1
-0.8191756 # threshold b, each following line is a SV (starting with alpha*y)
-0.01486310323903746941076597920528 1:4 2:4 #
0.01486310323903746941076597920528 1:1 2:-1 #
-0.01486310323903746941076597920528 1:5 2:1 #
0.01486310323903746941076597920528 1:6 2:-2 #
-0.01486310323903746941076597920528 1:8 2:1 #
0.01486310323903746941076597920528 1:2 2:-1 #
-0.01486310323903746941076597920528 1:8 2:1 #
0.01486310323903746941076597920528 1:7 2:-3 #
-0.0070203585328053545039361793556054 1:9 2:1 #
0.0070203585328053510344892274019912 1:3 2:-3 #

分類

svm_classify.exe my_test.txt out_model.dat out_test.txt
<<

入力ファイル 答えがわからないときはラベルを0とする：
my_test.txt 
>>
0 1:0 2:-1
0 1:1 2:-1
0 1:2 2:-1
0 1:3 2:1
0 1:4 2:1
0 1:5 2:5

出力ファイル　４番目のデータをはずしてるね。。

1.0553405
0.8794504
0.70356032
0.055340479
-0.1205496
-1.2410992

当たり前だけど、学習用データをつっこむと（こたえつきデータを入れると精度評価してくれる）

$ ../svm_classify.exe my_example.txt out_model.dat out_test2.txt
Reading model...OK. (10 support vectors read)
Classifying test examples..done
Runtime (without IO) in cpu-seconds: 0.00
Accuracy on test set: 100.00% (16 correct, 0 incorrect, 16 total)
Precision/recall on test set: 100.00%/100.00%

精度１００％．

$ cat out_test2.txt
0.8794504
0.70356032
1
1.0602748
1.1205496
0.23616488
0.29643968
-0.82410992
-1
-1.1758901
-1.8241099
-0.82904424
-0.29643968
-1.6531542
-7.4970013
-0.82410992

http://blog.goo.ne.jp/rominyan/e/af3f3cc9edb46fccc3b675fb7b452d9a　も参考にしました．

単語の意味解析の例