PostgreSQL统计信息中相关的数据字典表有哪些

网友投稿 478 2024-01-04 15:54:14

PostgreSQL统计信息中相关的数据字典表有哪些

本篇内容介绍了“PostgreSQL统计信息中相关的数据字典表有哪些”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!

一、pg_class

在pg_class数据字典表中,存储了Relation(包括Index)的两个统计信息:页面占用总数relpages和元组总数reltuples,在优化器执行物理优化时用于估算访问路径的启动成本和总成本.

测试脚本如下,在t_grxx表中插入10w数据,创建2个索引:drop table if existst_grxx;create table t_grxx(dwbh varchar(10),grbh varchar(10),xm varchar(20),xb varchar(10),nl int); insert intot_grxx(dwbh,grbh,xm,xb,nl)select generate_series(1,100000)/10||,generate_series(1,100000),XM||generate_series(1,100000), (case when (floor(random()*2)=0then  else  end),floor(random() * 100 + 1)::int; create index idx_t_grxx_grbh on t_grxx(grbh); create index idx_t_grxx_dwbh on t_grxx(dwbh);

基础关系t_grxx中的统计信息

testdb=#select relpages,reltuples from pg_class where relname = t_grxx;  relpages | reltuples  ----------+-----------       726 |    100000 --> 页面数726,元组数10,000 (1 row)

索引idx_t_grxx_grbh的统计信息

testdb=# selectrelpages,reltuples from pg_class where relname =idx_t_grxx_grbh;  relpages | reltuples  ----------+-----------       276 |    100000 --> 页面数276,元组数10,000 (1 row)二、pg_statistic

pg_statistic是PG中存储统计信息的主要数据字典表.通过命令ANALYZE生成统计数据,这些统计数据在计划阶段提供给优化器使用,是成本估算的基础.

pg_statistic的表结构如下:testdb=# \d pg_statistic              Table "pg_catalog.pg_statistic"    Column    |   Type   | Collation | Nullable |Default  -------------+----------+-----------+----------+---------  starelid| oid      |           not null |   staattnum   | smallint |           not null |   stainherit  | boolean  |           not null |   stanullfrac | real     |           not null |   stawidth    | integer  |           not null |   stadistinct | real     |           not null |   stakind1    | smallint |           not null |   stakind2    | smallint |           not null |   stakind3    | smallint |           not null |   stakind4    | smallint |           not null |   stakind5    | smallint |           not null |   staop1      | oid      |           not null |staop2| oid      |           not null |   staop3      | oid      |           not null |   staop4      | oid      |           not null |   staop5      | oid      |           not null |   stanumbers1 | real[]   |           |          |   stanumbers2 | real[]   |           |          |   stanumbers3 | real[]   |           |          |   stanumbers4 | real[]   |           |          |   stanumbers5 | real[]   |           |          |   stavalues1  | anyarray |           |          |   stavalues2  | anyarray |           |          |   stavalues3  | anyarray |           |          |   stavalues4  | anyarray |           |          |   stavalues5  | anyarray |           |          |  Indexes:     "pg_statistic_relid_att_inh_index"UNIQUE, btree (starelid, staattnum, stainherit)

其中:

starelid:数据列所属的关系或索引Oid

staattnum:数据列编号

stainherit:是否继承表(分区表)

stanullfrac:NULL值所占的比例

stawidth:平均行大小,以字节为单位

stadistinct:非NULL值的唯一值信息.>0,表示多少个唯一值;=0,表示未知;<0,根据重复值出现次数求得:stadistinct = (-1)x(1-stanullfrac)/times(重复次数)

stakindN:在N号slot槽中存储的是哪种类型的统计信息

staopN:在N号槽中所使用的操作符,如"=","<",">"等

stanumbersN:在N号槽中存储的数值类型统计信息

stavaluesN*:在N号槽中存储的数据值(使用anyarray类型)

统计信息类型,在src/include/catalog/pg_statistic.h文件中有相关解释,定义如下:

#define STATISTIC_KIND_MCV  1 #define STATISTIC_KIND_HISTOGRAM  2 #define STATISTIC_KIND_CORRELATION  3 #define STATISTIC_KIND_MCELEM  4 #define STATISTIC_KIND_DECHIST  5 #define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM  6 #define STATISTIC_KIND_BOUNDS_HISTOGRAM  7

MCV=most common values,最常见值,即高频值

HISTOGRAM=HISTOGRAM,数据分布直方图

CORRELATION=CORRELATION,相关系数,该列未排序的数据分布与排序后的数据分布的相关性

MCELEM=most common elements,与MCV类似,存储的是最常见非NULL值

DECHIST=distinct elements count histogram,描述数组类型列的每行中不同值的分布。

RANGE_LENGTH_HISTOGRAM=length histogram,以行为单位的距离类型列的长度分布。

BOUNDS_HISTOGRAM=与HISTOGRAM类似,面向range-type column

仍以t_grxx表为例说明该表的统计信息

表结构:testdb=# \d t_grxx                       Table "public.t_grxx"  Column |         Type          | Collation | Nullable |Default  --------+-----------------------+-----------+----------+---------  dwbh| character varying(10) |           |          |   grbh   | character varying(10) |           |          |xm| character varying(20) |           |          |   xb     | character varying(10) |           |          |   nl     | integer               |           |          |  Indexes:     "idx_t_grxx_dwbh" btree (dwbh)     "idx_t_grxx_grbh" btree (grbh)

数据列信息:

testdb=# select attrelid,attname,attnum from pg_attribute where attrelid = 16742 order by attnum;  attrelid | attname  | attnum  ----------+----------+--------     16742 | tableoid |     -7     16742 | cmax     |     -6     16742 | xmax     |     -5     16742 | cmin     |     -4     16742 | xmin     |     -3     16742 | ctid     |     -1     16742 | dwbh     |      1     16742 | grbh     |      2     16742 | xm       |      3     16742 | xb       |      4     16742 | nl       |      5 (11 rows)

dwbh/grbh/xm/xb/nl这5列的编号分别是1/2/3/4/5.

查询常规的统计信息testdb=# select starelid,staattnum,stainherit,stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 16742;  starelid | staattnum | stainherit | stanullfrac | stawidth | stadistinct  ----------+-----------+------------+-------------+----------+-------------     16742 |         1 | f          |           0 |        4 |        9984     16742 |         2 | f          |           0 |        5 |          -1     16742 |         3 | f          |           0 |        7 |          -1     16742 |         4 | f          |           0 |        4 |           2     16742 |         5 | f          |           0 |        4 |         100 (5 rows)

stanullfrac为0表示没有NULL值

stawidth表示各列的平均行大小,比如dwbh平均行大小为4个字节

stadistinct表示唯一值信息,比如dwbh,10000行中有9984个唯一值,而xb只有2个,grbh和xm则不存在重复值.

查询其他统计信息

testdb=# select starelid,staattnum,stakind1,stakind2,stakind3,stakind4,stakind5 from pg_statistic where starelid = 16742 order by staattnum;  starelid | staattnum | stakind1 | stakind2 | stakind3 | stakind4 | stakind5  ----------+-----------+----------+----------+----------+----------+----------     16742 |         1 |        1 |        2 |        3 |        0 |        0     16742 |         2 |        2 |        3 |        0 |        0 |        0     16742 |         3 |        2 |        3 |        0 |        0 |        0     16742 |         4 |        1 |        3 |        0 |        0 |        0     16742 |         5 |        1 |        3 |        0 |        0 |        0 (5 rows)

第1列(dwbh),在第1/2/3个槽中分别存储了类型为第1/2/3种类型的统计信息,即STATISTIC_KIND_MCV/STATISTIC_KIND_HISTOGRAM/STATISTIC_KIND_CORRELATION,其他槽没有内容

第2列(grbh),在第1/2个槽中分别存储了类型为第2/3种类型的统计信息,即STATISTIC_KIND_HISTOGRAM/STATISTIC_KIND_CORRELATION,其他槽没有内容

第5列(nl),在第1/2个槽中分别存储了类型为第1/3种类型的统计信息,即STATISTIC_KIND_MCV/STATISTIC_KIND_CORRELATION,其他槽没有内容

第一列统计信息testdb=# \x Expanded display is on. testdb=# select staattnum,stakind1,staop1,stanumbers1,stavalues1, testdb-#                  stakind2,staop2,stanumbers2,stavalues2, testdb-#                  stakind3,staop3,stanumbers3,stavalues3 testdb-# from pg_statistic  testdb-#where starelid = 16742  testdb-#       and staattnum = 1; -[ RECORD 1]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- staattnum   |1         -->dwbh stakind1    | 1-->STATISTIC_KIND_MCV,高频值统计信息 staop1      |98        -->"=",查询pg_operator,select * from pg_operator whereoid=98; stanumbers1 | {0.0003}  -->高频值比例,即3/10000 stavalues1  | {24}      -->表示24这个值 stakind2    |2         -->STATISTIC_KIND_HISTOGRAM,直方图 staop2      | 664       -->"<",左闭右开的区间 stanumbers2 |           -->无 stavalues2  | {0,1084,1172,1262,1356,1441,1533,1622,1703,1790,1885,198,207,2165,2259,2351,2447,2530,262,2708,2799,2895,2985,3087,3172,3262,3359,3459,3541,3629,3716,3805,3900,3995,4078,417,4257,4345,4432,4515,461,47,479,489,4985,5069,5154,5244,533,542,5510,5596,5686,5786,587,5963,6053,6145,6232,6323,6412,650,6599,6686,6779,6868,6957,7038,7127,7218,7305,7403,7495,7588,7679,7767,7857,7942,8037,8125,8209,8307,8392,8481,8575,8664,8755,8844,8935,9022,9115,9202,9296,9378,9464,9561,965,9731,982,9906,9999}                          -->剔除高频值后的数据分布,假定平均分布,0为MIN值,9999为MAX值                         -->[0,1084),[1084,1172),...[9906,9999) stakind3    | 3-->STATISTIC_KIND_CORRELATION,相关系数 staop3      |664       -->"<" stanumbers3 | {0.817163}-->未排序和已排序的数据分布,有81.7%的相关性,值越高,顺序扫描结果越近似于排序 stavalues3  |           -->无

第二列统计信息

testdb=# \x Expanded display is on. testdb=# select staattnum,stakind1,staop1,stanumbers1,stavalues1,                  stakind2,staop2,stanumbers2,stavalues2,                  stakind3,staop3,stanumbers3,stavalues3 from pg_statisticwhere starelid = 16742        and staattnum = 2; -[ RECORD1]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- staattnum   |2         -->grbh stakind1    | 2         -->STATISTIC_KIND_HISTOGRAM,直方图 staop1      | 664       -->"<" stanumbers1 |           -->NULL stavalues1  | {1,1088,11753,12659,1358,14448,15354,16253,17071,17939,18872,19821,20730,21676,22630,23540,24479,25302,26198,27069,27974,2894,29836,30851,31725,32619,33587,34584,35427,36280,37164,3806,39000,39942,40786,41695,42571,43452,44330,4516,46102,46997,47900,48899,49852,50697,5157,52452,533,54204,5512,55963,5686,57860,5871,59633,60544,61468,62331,63252,64134,65016,65993,66870,67798,68687,69568,70391,71285,72205,73066,74045,74962,75881,7679,77675,78573,79434,8038,81276,82125,83078,83939,84822,85764,86654,87557,88450,89352,90249,91171,92030,92979,93797,94665,95628,96510,97328,98212,99074,99998}                         -->参照第一列 stakind2    |3         -->STATISTIC_KIND_CORRELATION,相关系数 staop2      | 664       -->"<" stanumbers2 | {0.816172}-->未排序和已排序的数据分布 stavalues2  |           -->NULL stakind3    | 0         -->第3个槽无统计信息 staop3      |0 stanumbers3 |  stavalues3  |

第五列统计信息

testdb=# select staattnum,stakind1,staop1,stanumbers1,stavalues1, testdb-#                  stakind2,staop2,stanumbers2,stavalues2, testdb-#                  stakind3,staop3,stanumbers3,stavalues3 testdb-# from pg_statistic  testdb-#where starelid = 16742  testdb-#       and staattnum = 5; -[ RECORD 1]------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- staattnum   |5             -->nl stakind1    | 1-->STATISTIC_KIND_MCV,高频值统计信息 staop1      |96            -->"=" stanumbers1 | {0.0117667,0.0117333,0.0113,0.0112667,0.0112333,0.0110333,0.0109333,0.0109333,0.0109333,0.0108333,0.0108,0.0108,0.0107667,0.0107667,0.0107333,0.0107333,0.0107333,0.0107,0.0106333,0.0106333,0.0106333,0.0105667,0.0105333,0.0105333,0.0105,0.0104667,0.0104667,0.0104333,0.0104333,0.0103667,0.0103,0.0102667,0.0102333,0.0102333,0.0102333,0.0102333,0.0102,0.0102,0.0102,0.0101667,0.0101667,0.0101333,0.0101,0.0101,0.0100667,0.0100333,0.0100333,0.01,0.00996667,0.00996667,0.00996667,0.00996667,0.00993333,0.00993333,0.00993333,0.00993333,0.0099,0.00986667,0.00983333,0.00983333,0.0098,0.0098,0.0098,0.00973333,0.0097,0.00966667,0.00966667,0.00963333,0.00963333,0.00963333,0.00963333,0.0096,0.0096,0.00956667,0.00953333,0.0095,0.0095,0.0095,0.0095,0.00943333,0.0094,0.00936667,0.00936667,0.00933333,0.00933333,0.0093,0.00916667,0.00916667,0.00916667,0.00913333,0.0091,0.0091,0.0091,0.00906667,0.009,0.00896667,0.00893333,0.00876667,0.00876667,0.0079}                             -->每个高频值的出现频率 stavalues1  | {73,40,7,51,15,8,14,74,100,53,77,89,33,42,26,29,78,11,16,38,58,18,27,95,49,34,92,65,87,30,32,28,23,62,82,96,1,50,60,36,70,52,9,37,84,54,61,5,59,72,80,93,35,66,79,81,55,68,88,98,13,47,56,22,24,39,43,3,21,75,85,25,44,71,86,4,20,83,97,91,99,63,94,17,46,57,48,67,69,90,2,6,45,19,10,64,76,31,41,12}                             -->高频值 stakind2    |3             -->STATISTIC_KIND_CORRELATION,相关系数 staop2      | 97-->"<",整型比较,664是字符型比较 stanumbers2 | {0.00562935}  -->0.5%的相关性,相关性较低 stavalues2  |               -->NULL stakind3    |0             -->无相关信息 staop3      | 0stanumbers3 |  stavalues3  |三、pg_statistic_ext

pg_statistic_ext数据字典表用于存储多列统计信息,需使用CREATE STATISTICS命令创建统计信息,在执行ANALYZE命令时统计.

仍以t_grxx为例,统计dwbh和grbh这两列的信息testdb=# create statistics sta_t_grxx_dwbh_grbh on dwbh,grbh from t_grxx; CREATE STATISTICS testdb=# analyze t_grxx(dwbh,grbh); ANALYZE

查询多列统计信息

testdb=# \x Expanded display is on. testdb=#select * from pg_statistic_ext; -[ RECORD 1 ]---+----------------------------------------- stxrelid        | 16742                 -->数据表Oid stxname         | sta_t_grxx_dwbh_grbh  -->统计信息名称 stxnamespace    | 2200                  -->表空间 stxowner        | 10                    -->统计信息的Owner stxkeys         | 1 2                   -->列编号,1 2表示dwbh grbh stxkind         | {d,f}                 -->STATS_EXT_NDISTINCT/STATS_EXT_DEPENDENCIES stxndistinct    | {"1, 2"100000}      -->STATS_EXT_NDISTINCT统计信息 stxdependencies | {"1 => 2"0.040900"2 => 1"1.000000-->STATS_EXT_DEPENDENCIES统计信息

“PostgreSQL统计信息中相关的数据字典表有哪些”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注网站,小编将为大家输出更多高质量的实用文章!

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:kettle spoon如何连接hbase
下一篇:oracle基础知识有哪些
相关文章