Text
Page: 1
Regexp.new('2.0.0')
Ruby 2.0.0 での正規表現の新機能
西山和広
日本Rubyの会
Powered by Rabbit 2.1.1
Page: 2
Onigmo
Onigmo (Oniguruma-mod)
NEWS of Ruby 2.0.0 says following only:
Merge Onigmo
https://github.com/k-takata/Onigmo
Details are unknown
詳細不明
1/12
Page: 3
New feature (1) \K
examples without /\K/
"foobar".sub(/(?<=foo)bar/, "") #=> "foo"
"foobar".sub(/(?<=fo*)bar/, "")
# SyntaxError: invalid pattern in look-behind: /(?<=fo*)bar/
examples with /\K/
"foobar".sub(/foo\Kbar/, "") #=> "foo"
"foobar".sub(/fo*\Kbar/, "") #=> "foo"
2/12
Page: 4
New feature (1) \K
Treat the first non-blank character of the line.
examples with /\K/
gsub(/^ *\K(\d+)/) { $1.to_i+1 }
examples without /\K/
gsub(/^( *)(\d+)/) { "#{$1}#{$2.to_i+1}" }
3/12
Page: 5
New feature (2) \R
Linebreak
改行文字
Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])
4/12
Page: 6
New feature (3) \X
eXtended grapheme cluster
拡張書記素クラスタ
Unicode:
(?>\P{M}\p{M}*)
Not Unicode:
(?m:.)
5/12
Page: 7
Extended grapheme cluster
example:
"\u{304B 3099}"[/\X/].size #=> 2
U+304B HIRAGANA LETTER KA
U+3099 COMBINING KATAKANA-HIRAGANA
VOICED SOUND MARK
see [UAX #29] for more detail
(Unicode標準附属書29)
6/12
Page: 8
New feature (4)
conditional expression:
(?(cond)yes)
(?(cond)yes|no)
example:
" :f
#=>
":'f
#=>
o o "[/:(['"])?(?(1)[\w\s]+\1|\w+)/]
":f"
o o'"[/:(['"])?(?(1)[\w\s]+\1|\w+)/]
":'f o o'"
7/12
Page: 9
(?adu)
character set option (character range
option)
文字集合オプション (文字範囲オプション)
d: Default (compatible with Ruby 1.9.3)
a: ASCII
u: Unicode
see doc/RE in Onigmo for more detail
8/12
Page: 10
(?adu)
examples:
"\u{3042}"[/\w/]
"\u{3042}"[/(?a)\w/]
"\u{3042}"[/(?d)\w/]
"\u{3042}"[/(?u)\w/]
/a\b/
/(?a)a\b/
/(?d)a\b/
/(?u)a\b/
=~
=~
=~
=~
#=>
#=>
#=>
#=>
"a\u{3042}"
"a\u{3042}"
"a\u{3042}"
"a\u{3042}"
nil
nil
nil
"あ"
#=>
#=>
#=>
#=>
nil
0
nil
nil
9/12
Page: 11
(?adu)
(?-a), (?-d), (?-u) do not found
unlike (?-i), (?-m), (?-x)
10/12
Page: 12
Character Property
support for Unicode blocks
example:
/\p{InHiragana}/ =~ "\u3042" #=> 0
/\p{InCJKUnifiedIdeographs}/ =~ "\u3042" #=> nil
see tool/enc-unicode.rb in Onigmo for
more detail
11/12