Rabbit Slide Show

PGroonga & Zulip

2017-09-06

Description

PGroonga is fast and flexible full text search extension for PostgreSQL. Zulip is a chat tool that uses PostgreSQL and PGroonga. This talk describes why PGroonga is suitable for Zulip.

Text

Page: 1

PGroonga
&
Zulip
Kouhei Sutou
ClearCode Inc.
Zulip & PGroonga Night
2017-09-06
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 2

PGroonga
Pronunciation: píːzí:lúnɡά
読み方:ぴーじーるんが
PostgreSQL extension
PostgreSQLの拡張機能
Fast full text search
高速全文検索機能
All languages are supported!
全言語対応!
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 3

Fast? (高速?)
Need to measure to confirm
確認するには測定しないと
Targets (測定対象)
textsearch (built-in) (組み込み)
pg_bigm (third party) (外部プロダク
ト)
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 4

PGroona and textsearch
1.4
1.2
1
PGroonga
textsearch
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
0.8
0.6
0.4
0.2
0
PGroonga & Zulip
PostgreSQL OR MySQL
database
Query
America
Powered by Rabbit 2.2.1

Page: 5

As fast as textsearch
textsearchと同じくらいの速さ
textsearch uses word based
full text search
textsearchは単語ベースの全文検索実装
PostgreSQL has enough
performance for the approach
PostgreSQLはこの方法では十分な性能を出せる
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 6

textsearch and Japanese
textsearchと日本語
Asian languages including
Japanese aren't supported
日本語を含むアジア圏の言語は非サポート
Need plugin (プラグインが必要)
Plugin exists but isn't
maintained
プラグインはあるがメンテナンスされていない
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 7

Japanese support
日本語対応
Need one of them (どちらかが必要)
N-gram approach support
N-gramというやり方のサポート
Japanese specific word based
approach support
日本語を考慮した単語ベースのやり方のサポート
PGroonga supports both of them
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 8

PostgreSQL and N-gram
PostgreSQLとN-gram
PostgreSQL is slow with N-
gram approach
PostgreSQLでN-gramというやり方を使うと遅い
N-gram approach:
pg_trgm (contrib)
Japanese isn't supported by default
デフォルトでは日本語に対応していない
pg_bigm (third-party)
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 9

PGroona and pg_bigm
3
2.5
2
PGroonga
pg_bigm
Data: Japanese Wikipedia
(Many records and large documents)
N records: About 0.9millions
Average text size: 6.7KiB
1.5
1
0.5
0
311
Fast Fast
14706 20389
N hits
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 10

PGroonga is fast stably
PGroongaは安定して速い
PostgreSQL needs "recheck"
for N-gram approach
PostgreSQLはN-gramのときは「recheck」が必要
Seq search after index search
インデックスサーチのあとにシーケンシャルサーチ
PGroonga doesn't need
PGroongaでは必要ない
Only index search
インデックスサーチだけでOK
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 11

Wrap up
まとめ
textsearch is fast but
Asian langs aren't supported
textsearchは速いけどアジア圏の言語を未サポート
pg_bigm supports Japanese
but is slow for large hits
pg_bigmは日本語対応だがヒット数が多くなると遅い
PGroonga is fast and
supports all languages
PGroongaは速くて全言語対応
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 12

FYI: textsearch, PGroonga
and Groonga
1.4
1.2
1
textsearch
Groonga is 30x faster than others
0.8
0.6
0.4
0.2
0
PGroonga & Zulip
PGroonga
Groonga
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
PostgreSQL OR MySQL
database
Query
America
Powered by Rabbit 2.2.1

Page: 13

Zulip and PGroonga
Zulip uses textsearch by
default
Zulipはデフォルトでtextsearchを使用
Japanese isn't supported
日本語非対応
Zulip supports PGroonga as
option
ZulipでPGroongaも使うこともできる
Implemented by me
私が実装
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 14

Zulip: full text search
Zulipと全文検索
Zulip is chat tool
Zulipはチャットツール
Latency is important for UX
UX的にレイテンシーは大事
Index update is heavy
インデックスの更新は重い
Delay index update
インデックスの更新を後回しにしている
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 15

Delay index update
インデックス更新を後回し
CREATE TABLE zerver_message (
rendered_content text,
-- ... ↓Column for full text search
search_tsvector tsvector
); -- ↓Index for full text search
CREATE INDEX zerver_message_search_tsvector
ON zerver_message
USING gin (search_tsvector);
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 16

Delay index update
インデックス更新を後回し
-- Execute append_to_fts_update_log() on change
CREATE TRIGGER
zerver_message_update_search_tsvector_async
BEFORE INSERT OR UPDATE OF rendered_content
ON zerver_message
FOR EACH ROW
EXECUTE PROCEDURE append_to_fts_update_log();
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 17

Delay index update
インデックス更新を後回し
-- Insert ID to fts_update_log table
CREATE FUNCTION append_to_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO fts_update_log (message_id)
VALUES (NEW.id);
RETURN NEW;
END
$$;
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 18

Delay index update
インデックス更新を後回し
-- Keep ID to be updated
CREATE TABLE fts_update_log (
id SERIAL PRIMARY KEY,
message_id INTEGER NOT NULL
);
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 19

Delay index update
インデックス更新を後回し
-- Execute do_notify_fts_update_log()
-- on INSERT
CREATE TRIGGER fts_update_log_notify
AFTER INSERT ON fts_update_log
FOR EACH STATEMENT
EXECUTE PROCEDURE
do_notify_fts_update_log();
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 20

Delay index update
インデックス更新を後回し
-- NOTIFY to fts_update_log channel!
CREATE FUNCTION do_notify_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
NOTIFY fts_update_log;
RETURN NEW;
END
$$;
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 21

Delay index update
インデックス更新を後回し
cursor.execute("LISTEN ftp_update_log") # Wait
cursor.execute("SELECT id, message_id FROM fts_update_log")
ids = []
for (id, message_id) in cursor.fetchall():
cursor.execute("UPDATE zerver_message SET search_tsvector = "
"to_tsvector('zulip.english_us_search', "
"rendered_content) "
"WHERE id = %s", (message_id,))
ids.append(id)
cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)",
(ids,))
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 22

PGroonga: index update
PGroongaとインデックス更新
PGroonga's index update is
fast too
PGroongaはインデックス更新も速い
PGroonga's search while
index update is still fast
PGroongaはインデックス更新中の検索も速い
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 23

Perf characteristics
性能の傾向
Keep
search performance
while many updates
Update throughput
PGroonga & Zulip
GIN
Search
PGroonga
Decrease
search performance
while updating
Update throughput
Powered by Rabbit 2.2.1

Page: 24

Update and lock
更新とロック
Update without read locks
参照ロックなしで更新
Write locks are required
書き込みロックは必要
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 25

GIN: Read/Write
GIN:読み書き
GIN
INSERT
start
INSERT
finish
Conn1
Conn2
SELECT Blocked
start Slow down!
PGroonga & Zulip
SELECT
finish
Powered by Rabbit 2.2.1

Page: 26

PGroonga: Read/Write
PGroonga:読み書き
INSERT
start
PGroonga
INSERT
finish
Conn1
Conn2
SELECT SELECT
No slow down!
start finish
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 27

Wrap up
まとめ
Zulip: Low latency for UX
ZulipはUXのために低レイテンシーをがんばっている
Delay index update
インデックスの更新は後回し
PGroonga: Keeps fast search
with update
PGroongaは更新しながらでも高速検索を維持
Chat friendly characteristics
チャット向きの特性
PGroonga & Zulip
Powered by Rabbit 2.2.1

Page: 28

More PGroonga features
PGroongaの機能いろいろ
Query expansion (クエリー展開)
Support synonyms (同義語検索をサポート)
Similar search (類似文書検索)
Find similar messages
類似メッセージ検索
Fuzzy search (あいまい検索)
Stemming (ステミング)
PGroonga & Zulip
Powered by Rabbit 2.2.1

Other slides

Apache Arrow Apache Arrow
2018-12-08
Apache Arrow Apache Arrow
2018-11-17
Apache Arrow Apache Arrow
2017-06-13
Apache Arrow Apache Arrow
2017-05-28
Mroonga! Mroonga!
2015-10-30