rubyのhpricot でxmlの切り出し - arupaka-

以下の文章のcounts属性を抜出し表示
test.html

<head>This is a test. </head>
<body>
test
<dagu elements=xxx>
<td counts=3 no=xxx></td>
<td counts=4 no=yyy></td>
<td counts=hello no=zzz></td>
</body>

ルビーのソースコード
hpricotは, xmlやhtml切り出し用のライブラリ

gem install hpricot

で入手できる． rubygems をインストール後．

ルビーのソース

require 'rubygems'
require 'hpricot'

#漢字コード
$KCODE="u";

#ファイル読み込み
filename="test.html"
f=open(filename,"r");
f123=f.read();

#切り出し
doc=Hpricot(f123);
#tdダグを検索: /"td" でtdを検索
a=(doc/"td")
#tdダグのcounts属性を配列bにいれる．
b=a.map{|l| l["counts"]};

#出力
puts b;

f.close()

出力例

3
4
hello