R.A. Epigonos et al.

[perl] Time::PieceやDateTime::Format::Strptimeで日付をパース、早いのはTime::Piece

日付フォーマットのパースに関するデファクトスタンダードはconventionalなdatetime。formatがわかる場合はdatetime::formatを使えばいいんだけど、わからないときは結局自前でパーサを書いていた。パースを自作するのは面倒なので、DateTime::Format::StrptimeかTime::Pieceのstrptimeを使うのが解決策。早いのはTime::Piece

たとえば、ctimeフォーマットの文字列をパースする場合(twitterのcreated_atフィールドが相当)、Time::Pieceのstrptimeを使えば以下のようにかける。

$ perl -MTime::Piece -le 'my $t = Time::Piece->strptime("Mon Aug 23 12:44:51 +0000 2010", "%a %b %d %H:%M:%S %z %Y"); print $t->datetime;'
2010-08-23T12:44:51

同様にDateTime::Format::Strptimeを使う場合以下のようにかける。

$ perl -MDateTime::Format::Strptime -le 'my $parser = DateTime::Format::Strptime->new(pattern=>"%a %b %d %H:%M:%S %z %Y"); my $dt = $parser->parse_datetime("Mon Aug 23 12:44:51 +0000 2010"); print $dt->datetime;'
2010-08-23T12:44:51

ベンチマークの結果は以下。Time::PieceのstrptimeはDateTime::Format::Strptimeに比べて47倍程度高速。サブルーチンの呼び出しごとにパーサを定義する必要の無いDateTime::Format::Strptimeの方が低速だったのは驚き。Time::PieceはCで書かれたstrptimeを呼び出しているから高速なのかな?

$ perl test.pl
Benchmark: timing 10000 iterations of TEST1, TEST2...
     TEST1:  0 wallclock secs ( 0.66 usr +  0.00 sys =  0.66 CPU) @ 15151.52/s (n=10000)
     TEST2: 33 wallclock secs (31.83 usr +  0.03 sys = 31.86 CPU) @ 313.87/s (n=10000)
         Rate TEST2 TEST1
TEST2   314/s    --  -98%
TEST1 15152/s 4727%    --
$ cat test.pl
#! /usr/bin/perl -w
use strict;
use warnings;
use Benchmark qw/cmpthese timethese/;
use Time::Piece;
use DateTime::Format::Strptime;

my $count = 10000;
my $date = "Mon Aug 23 12:44:51 +0000 2010";
my $parser = DateTime::Format::Strptime->new(pattern=>"%a %b %d %H:%M:%S %z %Y");

cmpthese(
                timethese($count,
                        {'TEST1' => '&test1;', 'TEST2' => '&test2;', })
        );

exit;

sub test1 {
        my $t = Time::Piece->strptime($date, "%a %b %d %H:%M:%S %z %Y");
        $t->datetime;
}

sub test2 {
        my $dt = $parser->parse_datetime($date);
        $dt->datetime;
}
__END__

リファレンス

ソーシャルブックマーク

日付の近い記事

ChangeLog

Posted: 2009-07-30T16:04:53+09:00
Modified: 2009-07-30T16:04:53+09:00
Modified: 2011-01-23T04:39:05+09:00
Generated: 2026-07-20T23:09:26+09:00