AV女優の情報をデータベースに突っ込む

データベースを扱うプログラムの例として, DMM.comからAV女優の
情報を取得して、データベースに突っ込むというコードを書いてみた。

#!/usr/bin/env perl
use strict;
use warnings;
use utf8;

use Web::Scraper;
use URI;
use Time::HiRes;
use DBI;

binmode STDOUT, ":utf8";

my @actresses = get_actress_info();

my $dbh = DBI->connect(
    "dbi:SQLite:dbname=actress.db", undef, undef,
    {
        AutoCommit => 0,
    }
);

die DBI->errstr unless defined $dbh;

my $retval = $dbh->do(q/
CREATE TABLE IF NOT EXISTS actress (
  id INTEGER PRIMARY KEY,
  name TEXT,
  image TEXT
)/);

die "Can't create actress table\n" unless defined $retval;

my $sth = $dbh->prepare(q/
INSERT INTO actress (id, name, image) VALUES(?, ?, ?)
/);
die DBI->errstr unless defined $sth;

foreach my $actress (@actresses) {
    my $r = $sth->execute($actress->{dmm_id}, $actress->{name}, $actress->{image});
    die $sth->errstr unless defined $r;

    print "Insert ", $actress->{name}, "\n";
}
$dbh->commit;
$dbh->disconnect;

sub get_actress_info {
    my @urls = get_dmm_actress_urls();
    my @actresses;
    foreach my $url (@urls) {
        push @actresses, collect_actress_image($url);
    }

    return @actresses;
}

sub collect_actress_image {
    my $url = shift;

    my $image = scraper {
        process '.act-box > ul > li ', 'actresses[]' => scraper {
            process 'a > img',  image => '@src';
            process 'a', name  => 'TEXT';
            process 'a', id    => '@href';
        };
    };

    my $info = $image->scrape( URI->new($url) );

    my @actresses;
    foreach my $info ( @{$info->{'actresses'}}) {
        my ($name, $image, $id) = ($info->{name}, $info->{image}, $info->{id});

        $name  =~ s{[((].+?[))]}{}xms;
        $image =~ s{/(?:thumbnail|medium)}{}xms;
        $id    =~ m{id=(\d+)};
        my $dmm_id = $1;

        my $actress = {
            name  => $name,
            image => $image,
            dmm_id => $dmm_id,
        };

        push @actresses, $actress;
    }

    return @actresses;
}

sub get_dmm_actress_urls {
    my @initials = collect_initial_chars();
    my $url_tmpl = 'http://www.dmm.co.jp/mono/dvd/-/actress/=/keyword=?/';

    my @urls;
    foreach my $initial (@initials) {
        (my $url = $url_tmpl) =~ s{ \? }{$initial}xms;
        push @urls, $url;
    }

    return @urls;
}

sub collect_initial_chars {
    my @initials = qw(a i u e o);

    foreach my $a (qw/k s t n h m y r w/) {
        foreach my $b (qw/a i u e o/) {
            next if $a eq 'y' && $b =~ m/[ie]/;
            next if $a eq 'w' && $b ne 'a';

            push @initials, ($a . $b);
        }
    }

    return @initials;
}


確認。ひらがなの「なお」で終わる人の名前の検索

  % sqlite actress.db
  sqlite > SELECT name FROM actress WHERE name LIKE "%なお";
  津田なお
  鮎川なお
  林なお
  加藤なお


うまくいってますね。
SQLが全然理解できていないので, まずはそこから勉強しないとね。
ORMを理解して使ってみたいけど、それはまだまだ先になりそうですね。


SQLでいい本はないものだろうか?