• Blogs (9)
    • 📱 236 - 992 - 3846

      📧 jxjwilliam@gmail.com

    • Version: ‍🚀 1.1.0
  • Perl, unicode/utf8/gb2312 convert

    Blogs20122012-09-12


    Perl, unicode/utf8/gb2312 convert

    Here is a helpful chinese article which summarizes Perl’s unicode/utf8/gb2312 transfer. I list here for quick retrieve:

    use utf8;
    use Encode;
    use URI::Escape;
    
    $ = "n";
    
    #从unicode得到utf8编码
    $str = '%u6536';
    $str =~ s/%u([0-9a-fA-F]{4})/pack("U",hex($1))/eg;
    $str = encode( "utf8", $str );
    print uc unpack( "H*", $str );
    
    # 从unicode得到gb2312编码
    $str = '%u6536';
    $str =~ s/%u([0-9a-fA-F]{4})/pack("U",hex($1))/eg;
    $str = encode( "gb2312", $str );
    print uc unpack( "H*", $str );
    
    # 从中文得到utf8编码
    $str = "收";
    print uri_escape($str);
    
    # 从utf8编码得到中文
    $utf8_str = uri_escape("收");
    print uri_unescape($str);
    
    # 从中文得到perl unicode
    utf8::decode($str);
    @chars = split //, $str;
    foreach (@chars) {
        printf "%x ", ord($_);
    }
    
    # 从中文得到标准unicode
    $a = "汉语";
    $a = decode( "utf8", $a );
    map { print "u", sprintf( "%x", $_ ) } unpack( "U*", $a );
    
    # 从标准unicode得到中文
    $str = '%u6536';
    $str =~ s/%u([0-9a-fA-F]{4})/pack("U",hex($1))/eg;
    $str = encode( "utf8", $str );
    print $str;
    
    # 从perl unicode得到中文
    my $unicode = "x{505c}x{8f66}";
    print encode( "utf8", $unicode );

    Actually, to convert GB2312 to Unicode, then insert into MySQL Unicode_general_ci table, the following strange way might be more efficient:

    use Encode;
    $gb=decode("euc-cn","$gb");
    $unicode=$dbh->quote($gb);
    # to insert $unicode to MySQL unicode general_ci table.

    It seems strange, but works fine. Others, like Encode:from_to(), Encode:encode() all don’t work.