Rabbit Slide Show

PGroonga 2 – Make PostgreSQL rich full text search system backend!

2017-12-05

Description

PGroonga 2.0 has been released with 2 years development since PGroonga 1.0.0. PGroonga 1.0.0 just provides fast full text search with all languages support. It's important because it's a lacked feature in PostgreSQL. PGroonga 2.0 provides more useful features to implement rich full text search system with PostgreSQL. This session shows how to implement rich full text search system with PostgreSQL! This talk describes about PGroonga that resolves these problems.

Text

Page: 1

PGroonga 2
Make PostgreSQL rich full text
search system backend!
Kouhei Sutou
ClearCode Inc.
PGConf.ASIA 2017
2017-12-05
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 2

Targets
対象者
Want to implement full text
search with PostgreSQL
PostgreSQLで全文検索したい
Not good at full text search
全文検索はよく知らない
PGroonga 1.0.0 users
PGroonga 1.0.0は使ったことがある
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 3

Abbreviations
略語
PG: PostgreSQL
ポスグレ: PostgreSQL
FTS: Full text search
FTS: 全文検索
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 4

FTS system: Targets
全文検索システム:対象
Many
tests
大量のテキスト
e.g.: Text data in office
docs in file servers
例:ファイルサーバー内のオフィス文書内のテキスト
e.g.: Item descriptions,
chat logs, Wiki data, ...
例:商品説明やチャットログ、Wikiのデータなど
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 5

FTS system: Goal
全文検索システム:目的
Provide
needed info
when you need
必要な情報を必要なときに提供すること
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 6

Provide needed info
必要な情報を提供
Not found
探している情報が見つからない
Found
探している情報が見つかる
Found unconscious needed
info too!
意識していなかったけど実は欲しかった情報も見つか
る!
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 7

When you need
必要なときに活用
Need many times to find
なかなか見つからない
Find in no time
すぐに見つかる
Already found
すでに見つかっていた
e.g.: Recommendation
例:レコメンデーション
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 8

How to impl.: Options
実装方法:選択肢
Use FTS server
全文検索サーバーを使う
Use PostgreSQL
PostgreSQLを使う
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 9

FTS server: Pros
全文検索サーバー案:メリット
Provides all basic features
必要な機能が揃っている
Provides advanced features
+αの機能もある
Fast
速い
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 10

FTS server: Cons1
全文検索サーバー案:デメリット1
Large implementation cost
実装コスト大
Learn how to use from scratch
使い方を1から学ぶ必要がある
How to implement data sync?
マスターデータの同期はどうする?
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 11

FTS server: Cons2
全文検索サーバー案:デメリット2
Large maintenance cost
メンテナンスコスト大
Learn how to operate from
scratch
運用方法を1から学ぶ必要がある
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 12

PostgreSQL: Pros1
PostgreSQL案:メリット1
Less implementation cost
実装コスト小
Less things to be learned
新しく覚えることが少ない
Can manage data at the same
place
データの一元管理
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 13

PostgreSQL: Pros2
PostgreSQL案:メリット2
Less operation cost
メンテナンスコスト小
The current operation knowledge
is reusable
既存の運用ノウハウを使える
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 14

PostgreSQL: Cons
PostgreSQL案:デメリット
Built-in features aren't
enough
組込機能では機能不足
SQL limits efficiency
SQLの表現力不足
e.g.: SQL needs multiple
queries for a process that can
be done by 1 query by FTS server
例:全文検索サーバーなら1クエリーで実現できる処
理にSQLだと複数クエリー必要なことがある
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 15

The 3rd option
第3の選択肢
Use FTS engine via
PostgreSQL (SQL)
PostgreSQL経由(SQL)で全文検索エンジンを使う
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 16

Pros
メリット
Fast and rich features
高速で豊富な機能
Less implementation cost
実装コスト小
Less operation cost
メンテナンスコスト小
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 17

Cons
デメリット
Need PostgreSQL extension
PostgreSQLに拡張機能が必要
Not available on DBaaS
DBaaSで使えない
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 18

Option: No FTS knowledge
オススメの選択肢:全文検索の知識ナシ
Need only simple features
まだ単純な機能で十分
Less data: LIKE with PostgreSQL
データ少:PostgreSQLでLIKE
Need up-to-date FTS features
いまどきの全文検索機能が必要
FTS engine via PostgreSQL
PostgreSQL経由で全文検索エンジン
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 19

Option: With FTS knowledge
オススメの選択肢:全文検索の知識アリ
Need tuned FTS feature
カリカリにチューニングしたい
PostgreSQL + FTS server
PostgreSQL+全文検索サーバー
Others
それ以外
FTS engine via PostgreSQL
PostgreSQL経由で全文検索エンジン
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 20

Described option
説明する選択肢
FTS engine via
PostgreSQL
PostgreSQL経由で全文検索エンジン
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 21

FTS engine: Groonga
全文検索エンジン:Groonga(ぐるんが)
Embeddable FTS engine
組込可能な全文検索エンジン
PGroonga: Groonga in PostgreSQL
PGroonga:PostgreSQLに組込
Usable as FTS server
全文検索サーバーとして単独でも使用可能
PostgreSQL + FTS server
architecture is also available
PostgreSQL+全文検索サーバー構成もできる
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 22

Groonga's hobby: data update
Groongaの得意な事:データの追加・更新
Make fresh data searchable!
新鮮な情報をすぐ検索可能!
Batch update is needless
バッチで更新しなくてもよい
Can use as chat backend
チャットくらいの頻度でもOK
e.g.: Zulip uses PGroonga
例:ZulipはPGroongaを採用
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 23

Groonga's hobby: data update
Groongaの得意な事:データの追加・更新
Keep search performance
while updating!
更新中も検索性能が落ちない!
Updatable when there are many
search users
利用ユーザーが多い時でも更新可能
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 24

PGroonga
PGroonga(ぴーじーるんが)
PostgreSQL index
PostgreSQLのインデックス
Alternative of GIN, RUM, ...
GIN・RUMなどと同じレイヤー
Usage
使用方法
CREATE INDEX ...
USING PGroonga ...
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 25

PostgreSQL and FTS
PostgreSQLと全文検索
LIKE: Built-in (組込機能)
textsearch: Built-in (組込機能)
pg_trgm: Contrib (標準添付)
Bundled in the archive
アーカイブには含まれている
Need to install separately
別途インストールすれば使える
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 26

LIKE and performance
LIKEと速度
Small data
少ないデータ
Enough performance
十分実用的
Not small data
少なくないデータ
Need to tune
性能問題アリ
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 27

LIKE and FTS system
LIKEと全文検索システム
Enough performance
in most case
速度が実用的なことも多い
Data are small in many case
少ないデータなら
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 28

LIKE and FTS system
LIKEと全文検索システム
Unable to sort
それっぽい順のソート不可
Sort is important in FTS
全文検索ではソート順が重要
Users check only
the first N entries
ユーザーは先頭N件しか見ない
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 29

textsearch
Fast search by index
インデックスを作るので速い
Need module for each lang
言語毎にモジュールが必要
Modules for English,
French, ... are built-in
英語やフランス語などは組込
Modules for languages in Asia
aren't maintained
アジア圏の言語用のモジュールはメンテされていない
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 30

pg_trgm
Fast search by index
インデックスを作るので速い
Asian languages aren't
enough supported
アジア圏の言語のサポートは十分ではない
Unable to sort
それっぽい順のソート不可
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 31

RUM
RUM = GIN + position
RUMは位置情報付きのGIN
https://github.com/postgrespro/rum
pg_trgm/pg_bigm are slow
for much matches case
pg_trgmとpg_bigmはマッチ数が多いと遅くなる
RUM will solve it
GINの代わりにRUMを使うことで解決できるかも!
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 32

PGroonga
Fast search by index
インデックスを作るので速い
Sortable
それっぽい順のソート可
Support all languages
全言語対応
Need to install
separately
別途インストールする必要アリ
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 33

FTS system with PostgreSQL
PostgreSQLで全文検索システム
PGroonga is the best!
PGroongaがベスト!
PGroonga
Fast (高速)
Support all langs (全言語対応)
Sortable (それっぽい順でソート可)
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 34

FTS system: Basic features
全文検索システム:基本機能
Fast FTS + sort
高速全文検索+ソート
Show texts around keyword
キーワード周辺テキスト表示
Highlight keyword
検索キーワードハイライト
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 35

FTS system: Adv. features
全文検索システム:高度な機能
Auto complete
オートコンプリート
Similar search
類似文書検索
Synonym expansion
同義語展開
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 36

PGroonga 1.0.0
↓ are only supported
以下の機能のみ対応
Fast FTS + sort
高速全文検索+ソート
Show texts around keyword
キーワード周辺テキスト表示
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 37

PGroonga 2
All features
are supported!
全機能対応!
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 38

PGroonga 1.0.0 → 2
Many new features
たくさんの新機能
Improve performance
性能改善
API is changed
APIが変わった
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 39

API change
API変更
Operator is changed
演算子変更
@@ → &@~
%% → &@
...
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 40

API change
API変更
pgroonga schema is deprecated
pgroongaスキーマを非推奨に
pgroonga.score → pgroonga_score
pgroonga.flush → pgroonga_flush
...
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 41

App for PGroonga 1.0.0
PGroonga 1.0.0用アプリ
Broken with PGroonga 2?
PGroonga 2では動かない?
No! Work without any changes!
何も変更しなくても動くよ!
Great! But why?
いいじゃん!でもなんで動くの?
↓
"Painless upgrade" technique
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 42

Painless upgrade
PGroonga 2 provides
both 1 API and 2 API
PGroonga 2は1用のAPIも2用のAPIも両方提供
Can use PGroonga 2 with 1 API
PGroonga 1のAPIでPGroonga 2を使える
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 43

Painless upgrade
The last PGroonga 1.X
provides both 1 API and
partially 2 API
PGroonga 1系の最終版は1用のAPIも2用のAPIの一部も提
供
Can use PGroonga 1 with 2 API
PGroonga 2のAPIでPGroonga 1を使える
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 44

Painless upgrade
PGroonga 2 keeps 1 API
PGroonga 2の間は1のAPIを維持
PGroonga 3 will drop 1 API
PGroonga 3で1のAPIを削除予定
Just need to upgrade API until 3
PGroonga 3までにAPIをアップグレードすればよい
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 45

Painless upgrade
App for PGroonga 1.0.0
doesn't work with PGroonga 2
PGroonga 1.0.0用のアプリがPGroonga 2で動かない
It's a bug. Please report it!
バグなので報告してね!
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 46

FTS system: Basic features
全文検索システム:基本機能
Fast FTS + sort
高速全文検索+ソート
Show texts around keyword
キーワード周辺テキスト表示
Highlight keyword
検索キーワードハイライト
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 47

Fast FTS + sort
高速全文検索+ソート
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 48

Table definition
CREATE TABLE entries (
-- Need primary key
-- It's needed for sort
id integer PRIMARY KEY,
title text,
content text
);
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 49

Index definition
-- For FTS.
-- The default is good enough!
CREATE INDEX entries_full_text_search
ON entries
-- "USING PGroonga" is important!
-- Primary key is for sort!
USING PGroonga (id, title, content);
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 50

Insert data
-- Normal INSERT.
INSERT INTO entries
VALUES (1,
'Fast FTS with Groonga!',
'Fast FTS is needed!');
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 51

FTS
SELECT title FROM entries
WHERE
-- &@~ is for FTS
-- AND search with "search" and "fast"
title &@~ 'search fast' OR
content &@~ 'search fast';
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 52

FTS: LIKE
SELECT title FROM entries
WHERE
-- Index search for LIKE is supported
-- = Improve app perf without any changes
-- NOTE: &@~ is faster than LIKE
title LIKE '%search%' OR
content LIKE '%search%';
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 53

Sort
SELECT
title,
-- pgroonga_score(TABLE_NAME) returns
-- precision as number
pgroonga_score(entries) AS score
FROM entries
WHERE -- ...
-- Sort by precision
ORDER BY score DESC LIMIT 10;
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 54

Highlight keyword
キーワードハイライト
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 55

Hightlight for HTML
SELECT
pgroonga_highlight_html(
title,
-- Extract keywords from query
pgroonga_query_extract_keywords('search fast'))
FROM entries
WHERE title &@~ 'search fast' OR
content &@~ 'search fast';
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 56

Highlight for HTML: Example
Fast search with <Groonga>!
↓
<span class="keyword">Fast</span>
↑↓ Keywords are marked up with "class"
<span class="keyword">search</span>!
with &lt;Groonga&gt;! ← Escape tag
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 57

Texts around keyword
キーワード周辺テキスト
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 58

Texts around keyword for HTML
SELECT
pgroonga_snippet_html(
content,
-- Extract keywords from query
pgroonga_query_extract_keywords('search fast'))
FROM entries
WHERE title &@~ 'search fast' OR
content &@~ 'search fast';
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 59

Example
...fast search with <Groonga>!...
↓
ARRAY[
↓ First
'<span class="keyword">fast</span>
↑↓ Keywords are marked up with "class"
<span class="keyword">search/span>!
with &lt;Groonga&gt;!', ← Escape tag
'...' ← Second
]
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 60

FTS system: Adv. features
全文検索システム:高度な機能
Auto complete
オートコンプリート
Similar search
類似文書検索
Synonym expansion
同義語展開
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 61

Auto complete
オートコンプリート
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 62

Auto complete: Preparation
オートコンプリート:準備
Master table
マスターテーブル
Candidate
候補:(例:牛乳)
Readings in Katakana
(Only for Japanese)
ヨミ(日本語の場合。カタカナ。複数登録可。)
例:ギュウニュウ・ミルク
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 63

Auto complete: Implementation
オートコンプリート:実装方法
OR search with ...
Prefix search against readings
(Only for Japanese)
ヨミを前方一致検索(日本語の場合。)
Loose FTS against candidate
候補をゆるく全文検索
Sort by candidate
候補でソート
https://pgroonga.github.io/how-to/auto-complete.html
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 64

Table definition
CREATE TABLE terms (
term text, -- Candidate
readings text[], -- Readings
);
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 65

Data example
INSERT INTO terms VALUES (
'milk', -- Candidate
ARRAY[
-- Reading in Katakana
'ギュウニュウ', -- "milk" in Japanese
-- Multiple readings
'ミルク'
-- "milk" in Katakana
]
);
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 66

Data management
データ管理
Easy to maintain because
it's a normal table
普通のテーブルなので管理が楽
Easy to insert/delete/update
追加・削除・更新が楽
Normal backup and replication
ダンプ・リストアもレプリケーションもいつも通り
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 67

Index for prefix search
前方一致用インデックス
CREATE INDEX prefix_search ON terms
USING PGroonga
-- ...text_array_term_search...
(readings
pgroonga_text_array_term_search_ops_v2);
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 68

Index for loose FTS
緩い全文検索用インデックス
CREATE INDEX loose_search ON terms
USING PGroonga (term)
-- Tokenizer for loose full text search
WITH (tokenizer='TokenBigramSplitSymbolAlphaDigit');
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 69

How to search
検索方法
SELECT term FROM terms
-- Prefix search against readings
WHERE readings &^~ '${INPUT}' OR
-- Loose full text search
term &@ '${INPUT}'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 70

Search example: Candidate
検索例:候補
-- User inputs "il"
SELECT term FROM terms
-- Prefix search against readings
WHERE readings &^~ 'il' OR
-- Loose full text search (Hit)
term &@ 'il'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 71

Search example: Katakana
検索例:カタカナ
-- User inputs "ギュウ"
SELECT term FROM terms
-- Prefix search against readings (Hit)
WHERE readings &^~ 'ギュウ' OR
-- Loose full text search
term &@ 'ギュウ'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 72

Search example: Hiragana
検索例:ひらがな
-- User inputs "ぎゅう"
SELECT term FROM terms
-- Prefix search against readings (Hit)
WHERE readings &^~ 'ぎゅう' OR
-- Loose full text search
term &@ 'ぎゅう'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 73

Search example: Romaji
検索例:ローマ字
-- User inputs "gyu"
SELECT term FROM terms
-- Prefix search against readings (Hit)
WHERE readings &^~ 'gyu' OR
-- Loose full text search
term &@ 'gyu'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 74

Synonym expansion
同義語展開
Synonym
同義語
Same mean but different notation
同じ意味だが表記が異なる語
e.g.: "PostgreSQL" and "PG"
例:「PostgreSQL」と「ポスグレ」
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 75

Synonym expansion
同義語展開
Users don't want to care
ユーザーは気にしたくない
Synonym expansion
同義語展開
OR search with all synonyms
同義語すべてでOR検索
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 76

Implementation
実装方法
Create synonym table
同義語管理テーブルを作成
Expand synonyms in query
クエリー内の同義語を展開
Search by expanded query
展開後のクエリーで検索
https://pgroonga.github.io/reference/functions/
pgroonga-query-expand.html
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 77

Table definition
CREATE TABLE synonyms (
-- Term to be expanded
term text,
-- Synonym list.
-- Including the "term" itself.
-- If you don't input the "term",
-- the "term" is unsearchable term.
terms text[]
);
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 78

Data example
INSERT INTO synonyms
VALUES ('PostgreSQL', -- Expand "PostgreSQL"
ARRAY['PostgreSQL', 'PG']),
('PG', -- Expand "PG"
ARRAY['PG', 'PostgreSQL']);
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 79

Data management
データ管理
Easy to maintain because
it's a normal table
普通のテーブルなので管理が楽
Easy to insert/delete/update
追加・削除・更新が楽
Normal backup and replication
ダンプ・リストアもレプリケーションもいつも通り
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 80

Index definition
CREATE INDEX synonym_search ON synonyms
USING PGroonga
-- ...text_term_search...
-- For equal search
(term pgroonga_text_term_search_ops_v2);
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 81

Confirm
確認方法
SELECT pgroonga_query_expand(
'synonyms', -- Table name
'term', -- Column name to be expanded
'terms', -- Column name for synonyms
'PostgreSQL' -- Query
);
-- '((PostgreSQL) OR (PG))'
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 82

Search
検索方法
SELECT title FROM entries
WHERE
-- title &@~ 'DB ((PostgreSQL) OR (PG))'
title &@~
pgroonga_query_expand('synonyms',
'term',
'terms',
'DB PostgreSQL');
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 83

Similar search
類似文書検索
Query is document itself
検索クエリーは文書そのもの
Not keyword
キーワードではない
Use case
利用例
Show related entries
関連エントリーの提示に使える
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 84

Implementation
実現方法
Create dedicated index
類似検索用のインデックスを作る
Use tokenizer for target
language
対象の言語に合わせた処理で精度向上
e.g.: MeCab based tokenizer for
Japanese
例:日本語ならMeCabベースのトークナイザーを活用
Use dedicated operator
類似検索用の演算子を使う
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 85

Index definition
CREATE INDEX entries_similar_search
ON entries
-- Target: Both title and content
-- Reason: Title is important
USING PGroonga (id, (title || ' ' || content))
-- TokenMecab is good for Japanese
WITH (tokenizer='TokenMecab');
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 86

Search
SELECT title,
pgroonga_score(entries) AS score
FROM entries
WHERE
-- &@* is operator for similar search.
-- Search with existing document.
(title || ' ' || content) &@*
'...fast search with Groonga!...'
ORDER BY score DESC LIMIT 3;
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 87

Result example
結果例
Query:
...search with Groonga!...
Hit example:
...search with PGroonga!...
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 88

Wrap up: Basic features
全文検索システム:基本機能
Fast FTS + sort
高速全文検索+ソート
Show texts around keyword
キーワード周辺テキスト表示
Highlight keyword
検索キーワードハイライト
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 89

Wrap up: Adv. features
全文検索システム:高度な機能
Auto complete
オートコンプリート
Similar search
類似文書検索
Synonym expansion
同義語展開
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 90

FTS system: Next step
全文検索システム:次の一歩
Support structured data
構造化データ対応
Office document, HTML, ...
オフィス文書・HTMLなど
Needed features
対応に必要な処理
Text/metadata extraction
テキスト・メタデータ抽出
Create screenshot
スクリーンショット作成
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 91

Extraction tool
抽出ツール
Apache Tika
Apache Lucene's subproject
Many supported formats
対応フォーマットが多い
ChupaText
Groonga's subproject
Screenshot support
スクリーンショット作成対応
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 92

ChupaText
Supported formats (対応フォーマット)
Word/Excel/PowerPoint
ODT/ODS/ODP (OpenDocument)
PDF/HTML/XML/CSV/...
Interface (インターフェイス)
HTTP and command line
HTTPとコマンドライン
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 93

Install
インストール
Use Docker or Vagrant
DockerかVagrantを使うのが楽
https://github.com/ranguba/chupa-text-docker
https://github.com/ranguba/chupa-text-vagrant
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 94

ChupaText:Docker
% GITHUB=https://github.com
% git clone \
${GITHUB}/ranguba/chupa-text-docker.git
% cd chupa-text-docker
% docker-compose up --build
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 95

Usage
使い方
% curl \
--form data=@XXX.pdf \
http://localhost:20080/extraction.json
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 96

Result example
結果例
{
}
"mime-type": "application/pdf", # MIME type for the original data
"size": 147159, # Metadata
...,
"texts": [ # Extracted texts
{
"mime-type": "text/plain", # MIME type for the extracted data
...,
"creator": "Adobe Illustrator CS3", # Metadata
"body": "This is sample PDF. ...", # Extracted text
"screenshot": {
"mime-type": "image/png", # MIME type for screenshot
"data": "iVBORw...", # Base64-ed image data
"encoding": "base64"
}
}
]
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 97

Web UI
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 98

Web UI: Extraction example
Web UI:抽出例
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 99

Web UI: Extraction example
Web UI:抽出例
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 100

ChupaText:Vagrant
% GITHUB=https://github.com
% git clone \
${GITHUB}/ranguba/chupa-text-vagrant.git
% cd chupa-text-vagrant
% vagrant up
Usage is the same as Docker's
使い方はDocker版と同じ
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 101

Use cases (活用例)
Extracted text
Insert into PGroonga
Extracted metadata
Insert into PGroonga
Use for condition (絞り込みに活用)
Created screenshot
Show in search result (検索結果で表
示)
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 102

Wrap up
まとめ
FTS engine via PostgreSQL
PostgreSQL経由で全文検索エンジン
Provide decision info
採用の判断材料を提供
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 103

Wrap up
まとめ
Show how to impl. FTS system
全文検索システム実装例を紹介
PGroonga
PGroonga 1.0.0 and 2
PGroonga 1.0.0と2
Painless upgrade
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 104

Wrap up
まとめ
Show how to support
structured data
構造化データの対応方法を紹介
ChupaText
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Page: 105

Support service
サポートサービス紹介
Install support
(導入支援)
設計支援・性能検証・移行支援・…
Development support
(開発支援)
サンプルコード提供・問い合わせ対応・…
Operation support (運用支援)
障害対応・チューニング支援・…
Contact (問い合わせ先)
https://www.clear-code.com/contact/?
type=groonga
PGroonga 2 - Make PostgreSQL rich full text search system backend!
Powered by Rabbit 2.2.2

Other slides

Apache Arrow
2017-06-13
Apache Arrow
2017-05-28
Mroonga!
2015-10-30