๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป/database

[MySQL] ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ ์„ฑ๋Šฅ ๋น„๊ต

๋ฐ”์ฟ„๋ฆฌ 2025. 5. 28. 16:59

๊ฐœ์š”

- ner์„ ํฌํ•จํ•˜๋Š” ๊ธฐ์‚ฌ๋“ค์„ ์ถœ๋ ฅํ•  ๋•Œ, ํšจ์œจ์ ์œผ๋กœ ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์œ„ํ•œ ์ฟผ๋ฆฌ ์„ฑ๋Šฅ ๋น„๊ต

    1) JSON_CONTAIN

    2) MATCH … AGAINST

    3) MATCH … AGAINST + REGEXP

- 100,000๊ฐœ์˜ article๋กœ ์ง„ํ–‰

 

 

ํ™•์ธ

EXPLAIN ANALYZE ๋กœ ๋ถ„์„์„ ์ง„ํ–‰ํ–ˆ๋‹ค.

 

1. JSON_CONTAIN

EXPLAIN ANALYZE
  SELECT * FROM articles WHERE JSON_CONTAINS(ner, '"์‚ผ์„ฑ"');
-> Filter: json_contains(articles.ner,<cache>('"์‚ผ์„ฑ"'))
   (cost=10185.55 rows=98728)
   (actual time=0.148..102.554 rows=27555 loops=1)

-> Table scan on articles
   (cost=10185.55 rows=98728)
   (actual time=0.099..43.564 rows=100000 loops=1)
  • Table scan ๋ฐœ์ƒ: ์ „์ฒด articles ํ…Œ์ด๋ธ”์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ํ›‘์Œ (100,000 rows)
  • json_contains()๋Š” ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ•˜๋ฏ€๋กœ ๋ชจ๋“  ํ–‰์„ ์ง์ ‘ ๊ฒ€์‚ฌ
  • actual time: ์ „์ฒด ์†Œ์š” ์‹œ๊ฐ„์€ ์•ฝ 103ms ์ •๋„ ๊ฑธ๋ฆผ

 

2. MATCH … AGAINST

EXPLAIN ANALYZE
  SELECT * FROM articles
  WHERE MATCH(ner_text) AGAINST('์‚ผ์„ฑ' IN BOOLEAN MODE);
-> Filter: (match articles.ner_text against ('+์‚ผ์„ฑ' in boolean mode))
   (cost=0.35 rows=1)
   (actual time=13.105..77.032 rows=49819 loops=1)

-> Full-text index search on articles using idx_ner_text_ngram (ner_text='+์‚ผ์„ฑ')
   (cost=0.35 rows=1)
   (actual time=13.104..74.931 rows=49819 loops=1)
  • FULLTEXT ์ธ๋ฑ์Šค ์‚ฌ์šฉ: idx_ner_text๊ฐ€ ์‚ฌ์šฉ๋จ
  • ์˜ˆ์ƒ ๋น„์šฉ (cost)์ด ๋งค์šฐ ๋‚ฎ์Œ (0.35) → ๋งค์šฐ ๊ฐ€๋ฒผ์šด ์ž‘์—…์œผ๋กœ ์ธ์‹
  • actual time: 13.105ms ~ 77.032ms
    • 13.105ms : ์ฒซ๋ฒˆ์žฌ row ๋‚˜์˜จ ์‹œ๊ฐ„
    • 77.032ms : ์ „์ฒด 49819๊ฑด ๋‚˜์˜จ ์‹œ๊ฐ„
    • ์ด๊ฑด ์ธ๋ฑ์Šค๋ฅผ ํ†ตํ•ด 50,126๊ฐœ์˜ ๋งค์นญ row๋ฅผ ์ฐพ์Œ

 

โœ… JSON_CONTAIN, MATCH … AGAINST ์˜ ์ถœ๋ ฅ ๊ฐœ์ˆ˜๊ฐ€ ๋‹ค๋ฅด๋‹ค.

  • JSON_CONTAIN: 27555
  • MATCH … AGAINST: 49819
  • MATCH … AGAINST์œผ๋กœ ‘์‚ผ์„ฑ’์ด๋ผ๋Š” ํ‚ค์›Œ๋“œ๋Š” ๊ฒ€์ƒ‰ํ–ˆ์„ ๋•Œ, ‘์‚ผ์„ฑ’์„ ํฌํ•จํ•˜๋Š” ner์„ ๋ชจ๋‘ ๊ฒ€์ƒ‰ํ•œ๋‹ค.
    • ์˜ˆ: ์‚ผ์„ฑ, ์‚ผ์„ฑ์ „์ž ๋“ฑ..
  • ๋งŒ์•ฝ ‘์‚ผ์„ฑ’์„ ํฌํ•จํ•˜๋Š” ๋‹จ์–ด๊ฐ€ ์•„๋‹Œ, ์˜ค์ง ‘์‚ผ์„ฑ’๋งŒ ๊ฒ€์ƒ‰ํ•˜๋ ค๋ฉด ์•„๋ž˜์˜ ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด์•ผํ•œ๋‹ค.

 

3. MATCH … AGAINST + REGEXP

EXPLAIN ANALYZE
  SELECT COUNT(*) FROM articles
  WHERE MATCH(ner_text) AGAINST('+์‚ผ์„ฑ' IN BOOLEAN MODE)
  AND ner_text REGEXP '\\b์‚ผ์„ฑ\\b';
-> Filter: ((match articles.ner_text against ('+์‚ผ์„ฑ' in boolean mode)) and regexp_like(articles.ner_text,'\\b์‚ผ์„ฑ\\b'))
   (cost=0.35 rows=1)
   (actual time=9.632..87.606 rows=27555 loops=1)

-> Full-text index search on articles using idx_ner_text_ngram
   (ner_text='+์‚ผ์„ฑ')  (cost=0.35 rows=1)
   (actual time=9.528..73.840 rows=49819 loops=1)
  • JSON_CONTAIN์œผ๋กœ ๊ฒ€์ƒ‰ํ–ˆ์„ ๋•Œ์™€ 27555์œผ๋กœ ๋™์ผํ•จ
  • REGEXP๋Š” ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ•˜๋ฏ€๋กœ ํ›„์ฒ˜๋ฆฌ ๋‹จ๊ณ„์—์„œ ์‹œ๊ฐ„์ด ์ถ”๊ฐ€๋จ
  • ๊ทธ๋ž˜๋„ Fulltext ์ธ๋ฑ์Šค๋กœ ๋จผ์ € ํ•„ํ„ฐ๋ฅผ ๊ฑธ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ „์ฒด ํ…Œ์ด๋ธ” ์Šค์บ”๋ณด๋‹ค๋Š” ํ›จ์”ฌ ๋น ๋ฆ„

 

 

์ •๋ฆฌ

  JSON_CONTAINS MATCH ... AGAINST
์ธ๋ฑ์Šค ์‚ฌ์šฉ ๋ถˆ๊ฐ€ (ํ…Œ์ด๋ธ” ์ „์ฒด ์Šค์บ”) FULLTEXT ์ธ๋ฑ์Šค ์‚ฌ์šฉ
์†๋„ (์‹คํ–‰ ์‹œ๊ฐ„) ๋น ๋ฅด๊ฒŒ ์‹œ์ž‘ํ•˜์ง€๋งŒ ์˜ค๋ž˜ ๊ฑธ๋ฆผ ์•ฝ๊ฐ„ ๋А๋ฆฌ๊ฒŒ ์‹œ์ž‘ํ•˜์ง€๋งŒ ๋” ์ตœ์ ํ™”๋จ
์˜ˆ์ธก ๋น„์šฉ ๋†’์Œ ๋งค์šฐ ๋‚ฎ์Œ
ํ•„ํ„ฐ๋ง JSON ๋‚ด๋ถ€๊ฐ’ ํฌํ•จ ์—ฌ๋ถ€ ๊ฒ€์‚ฌ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ๋‹จ์–ด ๋งค์นญ

 

โžก๏ธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์ง€ ์•Š์„ ๋•Œ๋Š” JSON_CONTAINS ๊ฐ€ ๋” ๋น ๋ฅด์ง€๋งŒ

ํฌ๋กค๋ง์„ ์ง„ํ–‰ํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ๋Š” ๊ณ„์† ๋ˆ„์ ๋˜๋ฉด์„œ ์–‘์ด ๋งŽ์•„์งˆ ๊ฒƒ ์ด๊ณ 

์šด์˜ ๋น„์šฉ๋„ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ๋Š” MATCH ... AGAINST๊ฐ€ ๋” ํšจ์œจ์ ์ด๋ผ๊ณ  ํŒ๋‹จ๋จ

 

โžก๏ธ ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰ ์‹œ, ๊ทธ ํ‚ค์›Œ๋“œ๋Š” ํฌํ•จํ•˜๋Š” ๊ฒƒ ๋ชจ๋‘ ์ถœ๋ ฅํ•  ๊ฒƒ์ธ์ง€ / ๊ทธ ํ‚ค์›Œ๋“œ๋งŒ ์ถœ๋ ฅํ•  ๊ฒƒ์ธ์ง€๋Š” ํ”„๋กœ์ ํŠธ ๋ฐฉํ–ฅ์— ๋งž์ถœ์˜ˆ์ •