Rabbit Slide Show

PGroonga – Make PostgreSQL fast full text search platform for all languages!

2016-12-03

Description

PostgreSQL has built-in full text search feature. But it supports only limited languages. For example, it doesn't support Japanese. pg_trgm bundled in PostgreSQL supports all languages including Japanese. But it has performance problems for large documents. This talk describes about PGroonga that resolves these problems.

Text

Page: 1

PGroonga
Make PostgreSQL
fast full text search platform
for all languages!
Kouhei Sutou
ClearCode Inc.
PGConf.ASIA 2016
2016-12-03
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 2

PostgreSQL and me
PostgreSQLと私
Some my
patches are
merged
いくつかパッチがマージされている
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 3

Patches
パッチ
#13840: pg_dump generates
unloadable SQL
pg_dumpがリストアできないSQLを出力する
#14160: DROP ACCESS METHOD
IF EXISTS isn't impl.
DROP ACCESS METHOD IF EXISTSが実装されていない
They are found while
developing PGroonga
どちらもPGroonga開発中に見つけた問題
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 4

PGroonga dev style
PGroongaの開発スタイル
When there are problems in
related projects including
PostgreSQL
PostgreSQLを含む関連プロジェクトに問題があった場合
We fix these problems in these
projects instead of choosing
workaround in PGroonga
PGroonga側で回避するのではなく
関連プロジェクトの方で問題を直す
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 5

PostgreSQL and FTS
PostgreSQLと全文検索
PostgreSQL has built-in
full text search feature
PostgreSQLには組込の全文検索機能がある
It has some problems...
ただ、いくつか問題がある
We fixed them by PGroonga
PGroongaを開発することでそれらの問題を修正した
instead of fixing PostgreSQL 😓
PostgreSQLを修正するのではなくて…
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 6

Because...
理由は…
1. Our approach is different
from PostgreSQL's approach
PGroongaのやり方はPostgreSQLのやり方と違う
2. PG provides plugin system
PostgreSQLはプラグインの仕組みを提供している
Implementing as a plugin is
PostgreSQL way!
プラグインでの実装はPostgreSQLらしいやり方!
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 7

PG FTS problem
PostgreSQLの全文検索の問題
Many langs aren't supported
サポートしていない言語がたくさんある
e.g.: Asian languages
例:アジアの言語
Japanese, Chinese and more
日本語や中国語など
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 8

FTS for Japanese1
日本語の全文検索1
SELECT
to_tsvector('japanese',
'こんにちは');
-- ERROR: text search configuration
--
"japanese" does not exist
-- LINE 2: to_tsvector('japanese',
--
^
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 9

FTS for Japanese2
日本語の全文検索2
CREATE EXTENSION pg_trgm;
SELECT show_trgm('こんにちは');
-- show_trgm
-- -----------
-- {} ← Must not empty!
-- (1 row)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 10

Existing solution
既存の解決策
pg_bigm
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 11

pg_bigm
An extension
拡張機能
Similar to pg_trgm
pg_trgmと似ている
Operator class for GIN
GIN用の演算子クラス
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 12

pg_bigm: Usage
pg_bigm:使い方
CREATE INDEX index ON table
USING GIN (column gin_bigm_ops);
--
↑Use GIN ↑Specify op class
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 13

pg_bigm: Demerit
pg_bigm:デメリット
Slow for large document
文書が長いと遅い
(Normally, we want to use FTS for large document)
(普通は長い文書に対して全文検索したい)
Because it needs "recheck"
「recheck」が必要だから
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 14

"recheck"
"Exact" seq. search after
"loose" index search
「ゆるい」インデックス検索の後に実行する
「正確な」シーケンシャルサーチ
The larger text, the slower
対象テキストが大きければ大きいほど遅くなる
text = doc size * N docs
対象テキスト = 文書サイズ * 文書数
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 15

Benchmark
ベンチマーク
pg_bigm
Elapsed time (sec)
(Lower is better)
3
Data: Japanese Wikipedia
(Many records and large documents)
2.5
N records: About 0.9millions
Average text size: 6.7KiB
Slow
2
1.5
Slow
1
0.5
0
311
14706
20389
N hits
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 16

New solution
新しい解決策
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 17

PGroonga
Pronunciation: píːzí:lúnɡά
読み方:ぴーじーるんが
An extension
拡張機能
Index and operator classes
インデックスと演算子クラス
Not operator classes for GIN
GINの演算子クラスではない
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 18

PGroonga layer
Operator
classtextsearch
pg_trgm
pg_bigmPGroonga
IndexGINPGroonga
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 19

Benchmark
ベンチマーク
PGroonga
pg_bigm
Data: Japanese Wikipedia
(Many records and large documents)
2.5
N records: About 0.9millions
Average text size: 6.7KiB
Elapsed time (sec)
(Lower is better)
3
2
1.5
1
0.5
0
311
FastFast
1470620389
N hits
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 20

Wrap up1
まとめ1
PostgreSQL doesn't support
Asian languages
PostgreSQLはアジアの言語をサポートしていない
pg_bigm and PGroonga
support all languages
pg_bigmとPGroongaはすべての言語をサポートしている
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 21

Wrap up2
まとめ2
Many hits case:
ヒット数が多い場合
pg_bigm is slow
pg_bigmは遅い
PGroonga is fast
PGroongaは速い
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 22

Why is PGroonga fast?
PGroongaはどうして速いのか
Doesn't need "recheck"
「recheck」が必要ないから
Is "recheck" really slow?
本当に「recheck」が遅いの?
See one more benchmark result
もう一つベンチマーク結果を見てみましょう
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 23

Benchmark
ベンチマーク
Elapsed time (sec)
(Lower is better)
3
PGroonga
pg_bigm
Data: Japanese Wikipedia
(Many records and large documents)
N records: About 0.9millions
Average text size: 6.7KiB
Slow
2.5
2
1.5
Slow
1
0.5
Fast for many hits!
0
0
100000
200000
300000
N hits
PGroonga - Make PostgreSQL fast full text search platform for all languages!
400000
500000
Query: "日本"
Powered by Rabbit 2.2.0

Page: 24

Why is pg_bigm fast?
pg_bigmはどうして速いのか
Query is "日本"
クエリーは「日本」
Point: 2 characters
ポイント:2文字
pg_bigm doesn't need
"recheck" for 2 chars query
pg_bigmは2文字のクエリーに「recheck」の必要がない
It means that "recheck" is slow
つまり「recheck」が遅いということ
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 25

N-gram and "recheck"
N-gramと「recheck」
N-gram approach needs
"phrase search" when query
has N+1 or more characters
N+1文字以上のクエリーには「フレーズ検索」が必要
N=2 for pg_bigm, N=3 for pg_trgm
pg_bigmはN=2でpg_trgmはN=3
GIN needs "recheck" for
"phrase search"
GINは「フレーズ検索」には「recheck」が必要
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 26

Phrase search
フレーズ検索
Phrase search is "token
search" and "position check"
フレーズ検索は「トークン検索」と「位置チェック」
Tokens must exist and be ordered
トークンは同じ順序で出現していないといけない
OK: "car at" for "car at" query
NG: "at car" for "car at" query
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 27

N-gram and phrase search
1. Split text to tokens
テキストをトークンに分割
"cat"→"ca","at"
2. Search all tokens
すべてのトークンを検索
"ca" & "at" exist: Candidate!
3. Check appearance pos.
出現位置をチェック
"ca" then "at": Found!
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 28

N-gram and GIN: Create
N-gramとGIN:作成
Documents
ID Text
10 cat
20 at car
Tokenize
GIN
Token
"ca"
"at"
"t "
" c"
"ar"
Posting list
10,20
10,20
20
20
20
"ca","at"
"at","t "," c","ca","ar"
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 29

N-gram and GIN: Search
N-gramとGIN:検索
cat Query
Tokenize
"ca","at"
Documents
ID Text
10 cat
20 at car
Search
GIN
Token
"ca"
"at"
"t "
" c"
"ar"
Posting list
10,20 AND
10,20
10,20
20
20 Candidates
20
Appearance position check
(Point: Out of GIN)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 30

GIN and phrase search
GINとフレーズ検索
Phrase search needs
position check
フレーズ検索では出現位置チェックが必要
GIN doesn't support
position check
GINは出現位置チェックをサポートしていない
→GIN needs "recheck"→Slow!
GINでは「recheck」が必要だから遅い
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 31

Why is PGroonga fast?
PGroongaはどうして速いのか
PGroonga uses N-gram
by default
PGroongaはデフォルトでN-gramを使っている
But doesn't need "recheck"
PGroongaは「recheck」の必要がない
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 32

Why no "recheck"?
どうして「recheck」が必要ないのか
PGroonga uses
full
inverted index
PGroongaは完全転置インデックスを使っているから
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 33

Full inverted index
完全転置インデックス
Including
position
位置情報を含む
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 34

Inverted index diff
転置インデックスの違い
"ca","at"
Documents
2
ID Text Tokenize 1
"at","t "," c","ca","ar"
10 cat
1
3
4
5
2
20 at car
Not full: Only doc ID Full: Doc ID + pos
Token Posting list
Token Posting list
"ca" 10:1,20:4
"ca" 10,20
10,20
"at" 10:2,20:1
"at"
"t " 20:2
"t " 20
" c" 20:3
" c" 20
"ar" 20:5
"ar" 20
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 35

N-gram/PGroonga: Search
N-gramとPGroonga:検索
cat Query
Tokenize
"ca","at" Search
Documents
ID Text
10 cat
20 at car
PGroonga
Token Posting list
"ca" 10:1,20:4 AND
"at" 10:2,20:1
"t " 20:2
10
" c" 20:3
Result
"ar" 20:5
Appearance position check
(Point: In PGroonga)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 36

Wrap up
まとめ
N-gram needs phrase search
N-gramの場合はフレーズ検索が必要
Full inverted index
provides fast phrase search
完全転置インデックスを使うと高速にフレーズ検索でき
る
GIN isn't full inverted index
GINは完全転置インデックスではない
PGroonga uses full inverted
index
PGroongaは完全転置インデックスを使っている
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 37

FTS and English(*)
全文検索と英語
Normally, N-gram isn't used
for English FTS
普通は英語の全文検索にN-gramを使わない
N-gram is slower than word
based approach (textsearch
approach)
N-gramは単語ベースのやり方(textsearchのやり方)
より遅め
Stemming/stop word can't be used
N-gramではステミングとストップワードを使えない
(*) English≒Alphabet based languages
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 38

PGroonga and English
PGroongaと英語
PGroonga uses N-gram by default
PGroongaはデフォルトではN-gramを使う
Is PGroonga slow for English?
ではPGroongaは英語では遅いのか?
No. Similar to textsearch
遅くない。textsearchと同じくらい
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 39

PGroonga: Search
PGroonga:検索
PGroonga
textsearch
Data: English Wikipedia
(Many records and large docs)
1.2
N records: About 5.3millions
Average text size: 6.4KiB
1
Elapsed time (ms)
(Shorter is better)
1.4
0.8
0.6
0.4
0.2
0
PostgreSQL OR MySQL
database
Query
PGroonga - Make PostgreSQL fast full text search platform for all languages!
America
Powered by Rabbit 2.2.0

Page: 40

PGroonga's N-gram
Variable size N-gram
可変長サイズのN-gram
Continuous alphabets are 1 token
(= word based approach)
連続したアルファベットは1トークン(=単語ベース)
Hello→"Hello" not "He","el",…
No alphabet is 2-gram
非アルファベットは2-gram
こんにちは→"こん","んに",…
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 41

Wrap up1
まとめ1
PGroonga's search is fast
for all languages
PGroongaの検索はすべての言語で速い
Including alphabet based
languages and Asian languages
mixed case
アルファベットベースの言語とアジアの言語が混ざっ
た場合でも速い
(textsearch doesn't support mixed case)
(textsearchは言語を混ぜることはできない)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 42

Wrap up2
まとめ2
PGroonga makes PostgreSQL
fast full text search platform
for all languages!
PGroongaでPostgreSQLが
全言語対応高速全文検索プラットフォームになる!
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 43

More about PGroonga
PGroongaについてもっと
Performance
性能
Japanese specific feature
日本語向けの機能
JSON support
JSONサポート
Replication
レプリケーション
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 44

Performance
性能
Search and update
検索と更新
Index only scan
インデックスオンリースキャン
Direct Groonga search
直接Groongaで検索
Index creation
インデックス作成
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 45

Search and update
検索と更新
Doesn't decrease search
performance while updating
更新中も検索性能が落ちない
It's good characteristics for
chat application
チャットアプリでうれしい傾向
Zulip supports PGroonga
Zulip: OSS chat app by Dropbox
ZulipはPGroongaをサポートしている
ZulipはDropboxが開発しているOSSのチャットアプリ
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 46

Characteristics
傾向
Keep
search performance
while many updates
GIN
Search throughput
Search throughput
PGroonga
Update throughput
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Decrease
search performance
while updating
Update throughput
Powered by Rabbit 2.2.0

Page: 47

Update and lock
更新とロック
Update without read locks
参照ロックなしで更新
Write locks are required
書き込みロックは必要
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 48

GIN: Read/Write
GIN:読み書き
GIN
INSERT
start
INSERT
finish
Conn1
Conn2
SELECT Blocked
start Slow down!
PGroonga - Make PostgreSQL fast full text search platform for all languages!
SELECT
finish
Powered by Rabbit 2.2.0

Page: 49

PGroonga: Read/Write
PGroonga:読み書き
INSERT
start
PGroonga
INSERT
finish
Conn1
Conn2
SELECT SELECT
No slow down!
start finish
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 50

Fast stably
安定して速い
GIN has intermittent
performance decrements
GINは間欠的な性能劣化がある
For details:🔎"GIN pending list"
詳細は「GIN pending list」で検索
PGroonga keeps fast search
PGroongaは高速な検索を維持
PGroonga keeps index updated
PGroongaのインデックスは常に最新状態
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 51

Index only scan
インデックスオンリースキャン
GIN: Not supported
GIN:未サポート
PGroonga: Supported
PGroonga:サポート
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 52

More faster search
より高速な検索
Direct Groonga search is
more faster
直接Groongaで検索するとさらに高速
Groonga: Full text search
engine PGroonga uses
Groonga:PGroongaが使っている全文検索エンジン
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 53

Direct Groonga search
直接Groongaで検索
PGroonga
Groonga
Data: English Wikipedia
(Many records and large docs)
1.2
N records: About 5.3millions
Average text size: 6.4KiB
1
Elapsed time (ms)
(Shorter is better)
1.4
textsearch
Groonga is 30x faster than others
0.8
0.6
0.4
0.2
0
PostgreSQL OR MySQL
database
Query
PGroonga - Make PostgreSQL fast full text search platform for all languages!
America
Powered by Rabbit 2.2.0

Page: 54

Index creation time
インデックス作成時間
PGroonga
textsearch
Index creation
pg_trgm
Elapsed time (hour)
(Shorter is better)
3
Data: English Wikipedia
Size: About 33GiB
2.5
Max text size: 1MiB
2
1.5
2x faster
than textsearch
1
0.5
0
Module
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 55

Performance: Wrap up
性能:まとめ
Keep fast search w/ update
更新しながらでも高速検索を維持
Support index only scan
インデックスオンリースキャンをサポート
Direct Groonga search is
more faster
Groonga直接検索はもっと速い
Fast index creation
インデックス作成も速い
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 56

Japanese specific feature
日本語向けの機能
Completion by Romaji
ローマ字による入力補完
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 57

Completion: Table
入力補完:テーブル
CREATE TABLE stations (
name text,
readings text[]
-- ↑Support N readings
);
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 58

Completion: Data
入力補完:データ
INSERT INTO stations VALUES
('Tokyo',
ARRAY['トウキョウ']),
-- ↑In Katakana
-- (...),
('Akihabara',
ARRAY['アキハバラ', 'アキバ']);
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 59

Completion: Index
入力補完:インデックス
CREATE INDEX pgroonga_index
ON stations
USING pgroonga (
-- ↓For prefix and prefix Romaji/Katakana search
name
pgroonga.text_term_search_ops_v2,
-- ↓For prefix and prefix Romaji/Katakana search
-- against array
readings pgroonga.text_array_term_search_ops_v2);
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 60

Completion: Search
入力補完:検索
SELECT name, readings
FROM stations
WHERE name &^ 'tou' OR
-- ↑Prefix search
readings &^~> 'tou'
-- ↑Prefix Romaji/Katakana search
ORDER BY name LIMIT 10;
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 61

Completion: Result
入力補完:結果
Hit by
prefix Romaji/Katakana search
"tou"(Romaji)→"トウ"(Katakana)
前方一致RK検索でヒット
name | readings
-------+--------------
Tokyo | {トウキョウ}
(1 row)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 62

For Japanese: Wrap up
日本語向け機能:まとめ
Support prefix Romaji/Kana
search
前方一致RK検索をサポート
Useful for implementing auto
complete feature in search box
検索欄にオートコンプリート機能を実装する時に便利
Users don't need to convert
Romaji to Kanji
ユーザーはローマ字を漢字に変換する必要がない
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 63

JSON support
JSONサポート
Support full text search
全文検索対応
Target: All texts in JSON
JSON内のすべてのテキスト
Not only a text in a path
特定のパスのテキストだけではない
(GIN supports only this style)
(GINはこのやり方だけサポート)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 64

JSON: FTS: Data
JSON:全文検索:Data
CREATE TABLE logs (
record jsonb
);
INSERT INTO logs (record)
VALUES ('{"host": "app1"}'),
('{"message": "app is down"}');
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 65

JSON: FTS: Index
JSON:全文検索:インデックス
CREATE INDEX message_index ON logs
USING GIN
((record->>'message') gin_trgm_ops);
-- {"message": "HERE IS ONLY SEARCHABLE"}
CREATE INDEX record_index ON logs
USING pgroonga (record);
-- All string values are searchable
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 66

JSON: FTS: GIN
JSON:全文検索:GIN
SELECT * FROM logs
WHERE record->>'message' LIKE '%app%';
-- ↑ {"host": "app1"} isn't target
--
record
-- ----------------------------
-- {"message": "app is down"}
-- (1 row)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 67

JSON: FTS: PGroonga
JSON:全文検索:PGroonga
SELECT * FROM logs
WHERE record @@ 'string @ "app"';
-- ↑ All string values are target
--
record
-- ----------------------------
-- {"host": "app1"}
-- {"message": "app is down"}
-- (2 rows)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 68

JSON: Wrap up
JSON:まとめ
Support full text search
against all texts in JSON
JSON内の全テキスト対象の全文検索をサポート
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 69

Replication
レプリケーション
Support with PG 9.6!
PostgreSQL 9.6で使う場合はサポート!
PostgreSQL 9.6 ships
"generic WAL"
PostgreSQL 9.6で「generic WAL」機能が追加
Third party index can support
WAL generation
サードパーティーのインデックスもWALを生成できる
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 70

Implementation
実装
1. Master: Encode action
logs as MessagePack
マスター:アクションログをMessagePack形式に変換
2. Master: Write the action
logs to WAL
マスター:アクションログをWALに書き込み
3. Slaves: Read the action
logs and apply them
スレーブ:アクションログを読み込んで適用
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 71

Overview
概要
INSERT
SELECT
Master
Slave
PGroonga
PostgreSQL
PGroonga DB
Update
Index file
Append action logs
via generic WAL API
Action log
WAL
Apply pending action logs
on SELECT
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 72

Action log: "action"
アクションログ:「アクション」
{
"_action": ACTION_ID
}
# ACTION_ID: 0: INSERT
# ACTION_ID: 1: CREATE_TABLE
# ACTION_ID: 2: CREATE_COLUMN
# ACTION_ID: 3: SET_SOURCES
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 73

Action log: INSERT
アクションログ:INSERT
{
}
"_action": 0,
"_table": "TABLE_NAME",
"ctid":
PACKED_CTID_VALUE,
"column1": COLUMN1_VALUE,
...
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 74

Action log: Logs
アクションログ:複数ログ
{"_action": ACTION_ID, ...}
{"_action": ACTION_ID, ...}
{"_action": ACTION_ID, ...}
...
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 75

Write action logs
アクションログの書き込み
Page
Index file
Header
Action
logs
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 76

Apply action logs
アクションログの適用
PGroonga DB
Index file
1
3
10
50
2
4
Apply
Before
(2,10)
After
(2,50)
Applied offset
(Block#+Offset)
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 77

Action log: Why msgpack?
アクションログ:どうしてmsgpack?
Because MessagePack
supports streaming unpack
MessagePackはストリーミングで展開できるから
It's useful to stop applying
action logs when WAL is applied
partially on slaves
スレーブでWALが途中までしか書き込まれていないと
きにアクションログの適用を中断できるので便利
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 78

Replication: Wrap up
レプリケーション:まとめ
Support with PG 9.6!
PostgreSQL 9.6で使う場合はサポート!
Concept: Action logs on WAL
コンセプト:WAL上にアクションログを書く
It'll be an useful pattern for
out of PostgreSQL storage index
PostgreSQL管理外のストレージを使うインデックスで
はこのパターンが使えるはず
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 79

Wrap up1
まとめ1
PostgreSQL doesn't support
FTS for all languages
PostgreSQLの全文検索は一部の言語のみ対応
PGroonga supports FTS for
all languages
PGroongaの全文検索は全言語対応
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 80

Wrap up2
まとめ2
PGroonga is fast stably
PGroongaは安定して速い
PGroonga supports FTS for
all texts in JSON
PGroongaはJSON中の全テキストに対する全文検索に対応
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 81

Wrap up3
まとめ3
PGroonga supports
replication
PGroongaはレプリケーション対応
PostgreSQL 9.6 is required
ただしPostgreSQL 9.6が必要
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 82

Wrap up4
まとめ4
PGroonga makes PostgreSQL
fast full text search platform
for all languages!
PGroongaでPostgreSQLが
全言語対応高速全文検索プラットフォームになる!
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Page: 83

See also
https://pgroonga.github.io/
Tutorial: /tutorial/
Install: /install/
Reference: /reference/
Includes replication doc and
benchmark docs
Community: /community/
PGroonga - Make PostgreSQL fast full text search platform for all languages!
Powered by Rabbit 2.2.0

Other slides

Apache Arrow Apache Arrow
2018-12-08
Apache Arrow Apache Arrow
2018-11-17
Apache Arrow Apache Arrow
2017-06-13
Apache Arrow Apache Arrow
2017-05-28
Mroonga! Mroonga!
2015-10-30