意外な落とし穴！アプリケーション⇒DBデータ型によるパフォーマンス影響

当記事は EnterpriseDB 社のナレッジベース記事の翻訳の紹介です。

Jakub Wartak
2024年8月13日
https://knowledge.enterprisedb.com/hc/en-us/articles/15387800147740-SQL-Datatype-mismatch-preventing-index-use-with-JDBC-example

はじめに

PostgreSQLのオプティマイザがインデックスを適切に使用できない理由は様々です。

・インデックスが存在しない
・インデックスが無効（INVALID状態）
・例）CREATE CONCURRENT INDEX句が不正停止した
・式に対するインデックスで、対応する関数や演算が定義されていない
・データ型の不一致
・PostgreSQLの統計情報に基づくと、インデックスによる高速化が実施されない
・その種類のインデックスが、SQL内に記述された演算子をサポートしていなかった

ここでは、JDBC⇔PostgreSQL間でデータ型の不一致がインデックスの使用にどういった悪影響を及ぼすかを見ていきます。

問題の再現

①テーブルの作成

　※ここでは「bigint」データ型を利用します。

  
  create table t (id bigint primary key, val text);
  insert into t select g, 'sampledata' || g from generate_series(1, 2000000) g(i); 
  analyze t;
  alter table t owner to app;

　このように、bigint列にプライマリーキーとなるインデックスが存在するように、表を作成します。

②実行計画確認のため、auto_explainエクステンションを適切に導入します。

   
  alter system set auto_explain.log_min_duration = '30ms';
  alter system set auto_explain.log_analyze = on;
  alter system set auto_explain.log_buffers = on;
  alter system set auto_explain.log_verbose = on;
  alter system set auto_explain.log_timing = on;

③問題を再現するためのアプリケーションを用意します。

  
  import java.sql.*;
  import java.math.BigDecimal;
  public class JdbcDataTypeMismatch
  {
     public static void main(String[] args)
     {
       try
       {
         String url      = "jdbc:postgresql://127.0.0.1:4444/edb";
         String user     = "app";
         String password = "abc123";
         BigDecimal max  = new BigDecimal(10000);
         Class.forName("org.postgresql.Driver");
         Connection c  = DriverManager.getConnection(url, user, password);
         c.setAutoCommit(false);
         PreparedStatement ps = c.prepareStatement("select val from t where id = ?");
         for(BigDecimal i = new BigDecimal(0); i.compareTo(max) < 0; i = i.add(new BigDecimal(1))) {
           ps.setBigDecimal(1, i);
           ResultSet rs = ps.executeQuery();
           while (rs.next()) {
           }
           rs.close();
           c.commit();
           if(i.remainder(new BigDecimal(1000)).compareTo(BigDecimal.ZERO) == 0) {
                  System.out.println(i + "/" + max);
           }
         }
       }
       catch(ClassNotFoundException e)
       {
         System.out.println("Class Not Found : " + e.getMessage());
       }
       catch(SQLException e)
       {
         System.out.println("SQL Exception: " + e.getMessage());
       }
     }
  }

④アプリケーションをコンパイルし、実行します。

  
  javac JdbcDataTypeMismatch.java
  java -cp postgresql-42.6.0.jar:. JdbcDataTypeMismatch

⑤PostgreSQLログからインデックスが使用されなかった旨の出力を確認します。

  
  2023-09-20 11:47:16 CEST LOG:  duration: 410.446 ms  plan:
          Query Text: select val from t where id = $1
          Seq Scan on public.t  (cost=0.00..44608.00 rows=10000 width=17) (actual time=0.121..410.438 rows=1 loops=1)
            Output: val
            Filter: ((t.id)::numeric = '4'::numeric)
            Rows Removed by Filter: 1999999
            Buffers: shared hit=14608
  2023-09-20 11:47:17 CEST LOG:  duration: 469.929 ms  plan:
          Query Text: select val from t where id = $1
          Seq Scan on public.t  (cost=0.00..44608.00 rows=10000 width=17) (actual time=0.019..469.920 rows=1 loops=1)
            Output: val
            Filter: ((t.id)::numeric = '5'::numeric)
            Rows Removed by Filter: 1999999
            Buffers: shared hit=14608
  2023-09-20 11:47:17 CEST LOG:  duration: 459.444 ms  plan:
          Query Text: select val from t where id = $1
          Seq Scan on public.t  (cost=0.00..44608.00 rows=10000 width=17) (actual time=0.018..459.438 rows=1 loops=1)
            Output: val
            Filter: ((t.id)::numeric = '6'::numeric)
            Rows Removed by Filter: 1999999
            Buffers: shared hit=14608
  [..]

問題の修正

上の例を見ると分かる通り、PostgreSQLオプティマイザはインプットデータをnumeric型と判断し、検索条件「Filter: ((t.id)::numeric = '6'::numeric)」を適用しています。これは、JDBCクライアントが提供するBigDecimal型がデータベースにnumericとして送信されることによって起きます。そのため、データベース内部でこの値は暗黙的に変換されており、インデックスが使用されない事象を引き起こします。

JDBCでBigDecimal型を、PostgreSQLのbigint/int(8)型に合致するLong型に修正すれば、インデックスが適切に使用され、PreparedStatementを利用した平均実行時間を大幅に削減することができます。こちらの例では、実行時間が0.01msを下回ることが確認できました。

  
  edb=# select query, calls, mean_exec_time, stddev_exec_time from pg_stat_statements
  where query like 'select val from t where id = %';
                query              | calls |    mean_exec_time    |   stddev_exec_time
  ---------------------------------+-------+----------------------+----------------------
   select val from t where id = $1 | 10000 | 0.016391058100000012 | 0.014163808369450812

そして同様の問題はpsqlからでも容易に再現できます。

  
  edb=# explain analyze select val from t where id = 1;
                                                  QUERY PLAN
  -----------------------------------------------------------------------------------------------------------
   Index Scan using t_pkey on t  (cost=0.43..8.45 rows=1 width=17) (actual time=0.028..0.029 rows=1 loops=1)
     Index Cond: (id = 1)
   Planning Time: 0.203 ms
   Execution Time: 0.051 ms
  (4 rows)
  edb=# explain analyze select val from t where id = 1::numeric;
                                                QUERY PLAN
  ------------------------------------------------------------------------------------------------------
   Seq Scan on t  (cost=0.00..44608.00 rows=10000 width=17) (actual time=0.013..558.057 rows=1 loops=1)
     Filter: ((id)::numeric = '1'::numeric)
     Rows Removed by Filter: 1999999
   Planning Time: 0.132 ms
   Execution Time: 558.685 ms
  (5 rows)
  edb=#

プログラムコード改修例

   
  import java.sql.*;
  public class JdbcDataTypeMismatch
  {
     public static void main(String[] args)
     {
       try
       {
         String url      = "jdbc:postgresql://127.0.0.1:4444/edb";
         String user     = "app";
         String password = "abc123";
         Long max        = new Long(10000);
         Class.forName("org.postgresql.Driver");
         Connection c  = DriverManager.getConnection(url, user, password);
         c.setAutoCommit(false);
         PreparedStatement ps = c.prepareStatement("select val from t where id = ?");
         for(Long i = new Long(0); i < max; i++) {
           ps.setLong(1, i);
           ResultSet rs = ps.executeQuery();
           while (rs.next()) {
           }
           rs.close();
           c.commit();
           if(i % 1000 == 0) {
                  System.out.println(i + "/" + max);
           }
         }
       }
       catch(ClassNotFoundException e)
       {
         System.out.println("Class Not Found : " + e.getMessage());
       }
       catch(SQLException e)
       {
         System.out.println("SQL Exception: " + e.getMessage());
       }
     }
  }

なお、データ型マッピングについての詳細はJDBC™ 4.3仕様文書の「Appendix B / Data Type Conversion Tables」をご参照ください。

本再現ケースで利用した部分について、以下の通り互換表から引用します。

PostgreSQLデータ型	JDBCデータ型	Java言語型
bigint	BIGINT	long(Long)
numeric	NUMERIC	java.math.BigDecimal

（以上、EDB社ブログ記事より翻訳）

最後に

既にPostgreSQLやEDB Advanced Serverをご利用のお客様、あるいはご検討中のお客様にとって、アプリケーションからSQLを実行した際にパフォーマンスが劣化しないことは、重要なポイントの一つです。

今回ご紹介した記事ではデータ型の不一致がインデックスに及ぼす悪影響に絞り、アプリケーションからSQLを実施する際の気を付けるべきポイントを記載しております。

アシストではこうしたSQL劣化に関する調査や、DB診断の有償支援もご提供しておりますので、PostgreSQLやEDB Advanced Serverご検討の場合は、お気軽にご相談ください。

＼　PostgreSQLとEDBの違いについて知りたい方はこちらへ！　／

＼　EDBとは？について知りたい方はこちらへ！　／

執筆者情報

程　ゆき

2018年に中途入社。Oracle Database、EDB Postgres／PostgreSQLの支援経験を積み、2024年からEDB/PostgreSQLサポートを担当。趣味は競馬・麻雀。...show more

■本記事の内容について
　本記事に記載されている製品およびサービス、定義及び条件は、特段の記載のない限り本記事執筆時点のものであり、予告なく変更になる可能性があります。あらかじめご了承ください。

■商標に関して
　・Oracle®、Java及びMySQLは、Oracle、その子会社及び関連会社の米国及びその他の国における登録商標です。
　・Amazon Web Services、AWS、Powered by AWS ロゴ、[およびかかる資料で使用されるその他の AWS 商標] は、Amazon.com, Inc. またはその関連会社の商標です。
　　文中の社名、商品名等は各社の商標または登録商標である場合があります。

今さら聞けない!?WALアーカイブのベストプラクティス

PostgreSQL開発に多く貢献しているEnterpriseDB社による WALアーカイブ設定に関するベストプラクティスをご紹介します。

EDB
PostgreSQL

2025.04.16

PostgreSQLの拡張機能「system_stats」のご紹介

EDB社が提供するPostgreSQLの拡張機能「system_stats」はPostgreSQL ユーザーがパフォーマンス問題に取り組む際の非常に強力なツールになります。SQLクエリでOS情報を取得できるため、DBエンジニアにとってはパフォーマンスの監視が格段に簡単になります。テストした結果をご紹介します。

EDB
PostgreSQL

2024.09.12

新ツール Postgres Workload Report によるパフォーマンス診断～データベース管理の未来を共に創る！～

EDB Postgres Workload Reportsは、Postgresデータベースのパフォーマンス診断とトラブルシューティングを強化する新しいツールです。OracleのAWRに似た詳細なレポートを提供し、データベースの問題を迅速に特定・解決できるようサポートします。本記事では概要と利用手順をご紹介します。