0
点赞
收藏
分享

微信扫一扫

postgresql toast解析

先postgresql的页结构、页物理布局

postgresql toast解析_postgresql

​​postgresql 记录(元祖)物理布局​​

​​

postgresql toast解析_st表_02

​​

 

 源码定义如下:

struct HeapTupleHeaderData
{
union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
}t_choice;
ItemPointerData t_ctid;//
uint16 t_infomask2; /* number of attributes + various flags */
uint16 t_infomask; /* various flag bits, see below */
uint8 t_hoff; /* sizeof header incl. bitmap, padding */
bits8 t_bits[FLEXIBLE_ARRAY_MEMBER]; /* bitmap of NULLs */
};

铺开来看的话:

Field

Type

Length

Description

t_xmin

TransactionId

4 bytes

insert XID stamp

t_xmax

TransactionId

4 bytes

delete XID stamp

t_cid

CommandId

4 bytes

insert and/or delete CID stamp (overlays with t_xvac)

t_xvac

TransactionId

4 bytes

XID for VACUUM operation moving a row version

t_ctid

ItemPointerData

6 bytes

current TID of this or newer row version

t_infomask2

uint16

2 bytes

number of attributes, plus various flag bits

t_infomask

uint16

2 bytes

various flag bits

t_hoff

uint8

1 byte

offset to user data

所有字段的详情定义在src/include/access/htup_details.h中(这里面的信息对开发非常重要)。

 

postgresql toast解析_postgresql_03

TOAST块在shared_buffer中也是压缩状态,只有在访问时才会被解压。

es_test=# \d+ big_search_doc_new_ic 
Table "public.big_search_doc_new_ic"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------------+-------------------------+-----------+----------+---------+----------+--------------+-------------
summary | character varying(1024) | | | | extended | |
sentiment | character varying(100) | | | | extended | |
industryname | character varying(100) | | | | extended | |
secucode | character varying(16) | | | | extended | |
filename | character varying(100) | | | | extended | |
industrycode | character varying(16) | | | | extended | |
format | character varying(100) | | | | extended | |
reporttype | character varying(16) | | | | extended | |
fstoregroup | character varying(16) | | | | extended | |
title | character varying(100) | | | | extended | |
docid | character varying(36) | | | | extended | |
content | text | | | | extended | |
industrychains | character varying[] | | | | extended | |
investranking | character varying(16) | | | | extended | |
industrychainnodes | character varying[] | | | | extended | |
docsource | character varying(16) | | | | extended | |
secuname | character varying(16) | | | | extended | |
fstorepath | character varying(64) | | | | extended | |
riskranking | character varying(16) | | | | extended | |
orgname | character varying(32) | | | | extended | |
publishdate | character varying(32) | | | | extended | |
pagecount | integer | | | | plain | |
authors | character varying[] | | | | extended | |
Access method: heap

es_test=# select oid,relname,reltoastrelid,relpages,reltuples from pg_class where relname='big_search_doc_new_ic';
oid | relname | reltoastrelid | relpages | reltuples
--------+-----------------------+---------------+----------+--------------
335878 | big_search_doc_new_ic | 335881 | 618489 | 3.210117e+06

es_test=# select oid,relname,reltoastrelid,relpages,reltuples from pg_class where oid=335881; -- TOAST比表大是正常的
oid | relname | reltoastrelid | relpages | reltuples
--------+-----------------+---------------+----------+--------------
335881 | pg_toast_335878 | 0 | 18619610 | 6.538884e+07

select
a.oid,
a.relname,
a.reltoastrelid,
a.relpages,
a.reltuples,
     a.relfilenode,
b.oid,
b.relname,
b.reltoastrelid,
b.relpages,
     b.relfilenode,
b.reltuples
from
pg_class a,
pg_class b
where
a.relname = 'data'
and a.reltoastrelid = b.oid;

es_test=# select * from pg_toast_335878 limit 1;    # TOAST表不能直接访问。
ERROR: relation "pg_toast_335878" does not exist
LINE 1: select * from pg_toast_335878 limit 1;
^

es_test=#                           
SELECT tuple_data_split('big_search_doc_new_ic'::regclass, t_data, t_infomask, t_infomask2, t_bits) FROM heap_page_items(get_raw_page('big_search_doc_new_ic', 0)) limit 1;






tuple_data_split






-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------
{"\\x840b0000e4bcb4e99a8fe79d80e4b8ade59bbde4bc81e4b89ae695b0e5ad97e58c96e58d87e7baa7e79a84e5a4a7e8b68be58abfefbc8c5361615320e8a18ce4b89ae5b086e698afe995bfe69c9fe58f97e79b8ae79a84e4bc98e8b4a
8e8b59be98193e38082e4b88ee7be8ee59bbd205361615320e8a18ce4b89ae79a84e7bea4e6989fe997aae88080e79bb8e6af94efbc8ce4b8ade59bbd205361615320e5b882e59cbae8bf98e5a484e4ba8ee58f91e5b195e697a9e69c9fefbc
8ce4bd86e99a8fe79d80e68891e59bbde4ba91e8aea1e7ae97e59fbae7a180e8aebee696bde98090e6b890e68890e7869fefbc8ce4b8ade59bbd2053616153e4b99fe5b086e8bf8ee69da5e799bee88ab1e9bd90e694bee38082e79bb8e5afb
9e4ba8ee2809ce88a82e6b581e59e8be2809d53616153efbc8ce68891e4bbace69bb4e79c8be5a5bde2809ce5bc80e6ba90e59e8be2809d5361615320e79a84e58f91e5b195e380825361615320e8a18ce4b89ae794b1e9809ae794a8e7b1bb
e98090e6b890e59091e59e82e79bb4e8a18ce4b89ae7b1bbe6b897e9808fefbc8ce69bb4e5a49ae7bb86e58886e8a18ce4b89ae58685e68896e5b086e6b68ce78eb0e9be99e5a4b4e38082e68891e4bbace9a696e6aca1e8a686e79b96e4b8a
de59bbd205361615320e8a18ce4b89ae5928ce585ade5aeb6e4bc98e8b4a8e585ace58fb8e6a087e79a84efbc8ce59d87e7bb99e4ba88e2809ce4b9b0e585a5e2809de8af84e7baa7efbc8ce68ea8e88d90e9a1bae5ba8fe4be9de6aca1e4b8
bae4b8ade59bbde69c89e8b59eefbc88383038332e484befbc89e38081e5beaee79b9fe99b86e59ba2efbc88323031332e484befbc89e38081e98791e5b1b1e8bdafe4bbb6efbc88333838382e484befbc89e38081e794a8e58f8be7bd91e7b
b9cefbc883630303538382e5348efbc89e38081e98791e89db6e59bbde99985efbc883236382e484befbc89e5928ce6988ee6ba90e4ba91efbc883930392e484befbc89e380820a",NULL,NULL,NULL,"\\x49e4b8ade59bbd53616153e8a18
ce4b89aefbc9ae9a38ee887b3e4ba91e8b5b72e706466",NULL,NULL,"\\x1be585b6e4bb96e7a094e68aa5","\\x0f67726f757031","\\x41e4b8ade59bbd53616153e8a18ce4b89aefbc9ae9a38ee887b3e4ba91e8b5b7","\\x43666232
6564633232643931623462303439343466636464366137393833666631","\\x01123d470500cd1402000e20050009200500","\\x1b000000000000000013040000","\\x0fe8b685e9858d","\\x1b000000000000000013040000","\\x0
f6673746f7265","\\x03","\\x5b4d30302f30302f30302f4368516775315f354f3265414c5161484146354e72562d5a4639343031352e706466",NULL,"\\x27e6b5a6e993b6e59bbde99985e8af81e588b8","\\x33323032302d31322d3
2315430303a30303a30302b30383030","\\x88000000","\\x7b010000000000000013040000030000000100000034000000e69da8e5ad90e8b68500000028000000e69e97e790b0000028000000e8b5b5e4b8b90000","\\x0112564a0100
524a01000d20050009200500"}
(1

​​https://www.postgresql.org/docs/current/pageinspect.html#id-1.11.7.31.5​​

​​http://www.mytecdb.com/blogDetail.php?id=223​​

TOAST字段(可变长度类型字段,含varchar、bytea、jsonb等,不一定非得超过2000字节)的存储选项

  默认情况下,当字段超过2KB的时候TOAST就会尝试压缩宽列,如果压缩之后仍然超过2KB,就会将宽字段分块,源表通过指针指向分块后的TOAST,如果低于2KB,则直接存储。所以有些宽列压缩后存储在行内,有些在TOAST是可能的。

​ALTER TABLE ... SET (toast_tuple_target = N)​

​​ALTER TABLE ... ​​​​ALTER [ COLUMN ] ​column_name

  • ​PLAIN​​ prevents either compression or out-of-line storage; furthermore it disables use of single-byte headers for varlena types. This is the only possible strategy for columns of non-TOAST-able data types.
  • ​EXTENDED​​ allows both compression and out-of-line storage. This is the default for most TOAST-able data types. Compression will be attempted first, then out-of-line storage if the row is still too big.
  • ​EXTERNAL​​​ allows out-of-line storage but not compression. Use of ​​EXTERNAL​​ will make substring operations on wide ​​text​​ and ​​bytea​​ columns faster (at the penalty of increased storage space) because these operations are optimized to fetch only the required parts of the out-of-line value when it is not compressed.
  • ​MAIN​​ allows compression but not out-of-line storage. (Actually, out-of-line storage will still be performed for such columns, but only as a last resort when there is no other way to make the row small enough to fit on a page.

alter table data add column big_data_plain text;
alter table data alter column big_data_plain set storage plain;
alter table data add column big_data_external text;
alter table data alter column big_data_external set storage external;
alter table data add column big_data_main text;
alter table data alter column big_data_main set

  update data set big_data_plain = big_column,big_data_external = big_column,big_data_main = big_column;

  analyze data;

  可通过select attname,attstorage from pg_catalog.pg_attribute where attrelid = 24594;查询列存储格式,可知已经是指定格式。

id               |p         |
data |x |
big_column |x |
big_data_plain |p |
big_data_external|e |
big_data_main |m |

  但是查询toast关系大小,仍然是0页?难道在主表中?

oid  |relname|reltoastrelid|relpages|reltuples|relfilenode|oid  |relname       |reltoastrelid|relfilenode|relpages|reltuples|
-----+-------+-------------+--------+---------+-----------+-----+--------------+-------------+-----------+--------+---------+
24594|data | 24598| 1| 16.0| 57388|24598|pg_toast_24594| 0| 57389| 0| 0.0|

  

postgresql toast解析_st表_04

 

   查看对应的57389文件,大小不为空(应该算是缺陷)。

TOAST压缩

  在PG中,针对TOAST,默认使用的是LZ压缩算法,实现为​​PGLZ​​。压缩得通常用法是在写磁盘和网络之前进行压缩,在应用接收和读取之后解压。

  ​​https://medium.com/@lk.snatch/postgresql-compression-854a4647ee43​​

  ​​https://github.com/lz4/lz4​​

pg 14为什么要增加LZ4算法选择

  论压缩率,LZ4和PGLZ差异是不大的,但是LZ4速度相比PGLZ要更快,所以这在没有结果集缓存时非常重要,因为每次访问都要重复解压,单笔体现不出来。

  ​​https://stackoverflow.com/questions/67537111/how-do-i-decide-between-lz4-and-snappy-compression​​

postgres=# 
postgres=# insert into tab_compression_1 select x,lpad('abc',4000,'xfe') from generate_series(1,100000) as x;
INSERT 0 100000
Time: 1274.374 ms (00:01.274)
postgres=#
postgres=#
postgres=#
postgres=# insert into tab_compression_1 select x,lpad('abc',4000,'xfe') from generate_series(1,100000) as x;
INSERT 0 100000
Time: 1264.036 ms (00:01.264)
postgres=#
postgres=#
postgres=#
postgres=# insert into tab_compression_1 select x,lpad('abc',4000,'xfe') from generate_series(1,100000) as x;
INSERT 0 100000
Time: 1270.027 ms (00:01.270)
postgres=#
postgres=#
postgres=# insert into tab_compression_2 select x,lpad('abc',4000,'xfe') from generate_series(1,100000) as x;
INSERT 0 100000
Time: 178.854 ms
postgres=# insert into tab_compression_2 select x,lpad('abc',4000,'xfe') from generate_series(1,100000) as x;
INSERT 0 100000
Time: 178.418 ms
postgres=# insert into tab_compression_2 select x,lpad('abc',4000,'xfe') from generate_series(1,100000) as x;
INSERT 0 100000
Time: 179.050 ms
postgres=# select count(distinct substr(data,3,10)) as x from tab_compression_1 ;
x
---
1
(1 row)

Time: 140.865 ms
postgres=# select count(distinct substr(data,3,10)) as x from tab_compression_1 ;
x
---
1
(1 row)

Time: 136.520 ms
postgres=# select count(distinct substr(data,3,10)) as x from tab_compression_1 ;
x
---
1
(1 row)

Time: 136.078 ms
postgres=# select count(distinct substr(data,3,10)) as x from tab_compression_2 ;
x
---
1
(1 row)

Time: 109.513 ms
postgres=# select count(distinct substr(data,3,10)) as x from tab_compression_2 ;
x
---
1
(1 row)

Time: 110.300 ms
postgres=# select count(distinct substr(data,3,10)) as x from tab_compression_2 ;
x
---
1
(1 row)

Time: 111.693

查询时间降低了20%。插入时间降低了86%。

  那为什么默认不使用lz4呢?因为升级问题,如果直接从13及之前版本升级上来,数据就会破坏。

最后

  最后,还需要一提的是,postgresql还支持真正的大对象large object,并且包含对应的API和数据字典,虽然现在一般都用TOAST代替了,但是其比TOAST更加灵活、当然也更加原始,可参见​​https://www.hs.net/lightdb/docs/html/largeobjects.html​​。

  java保存图片到postgresql:​​https://www.enterprisedb.com/postgres-tutorials/postgresql-toast-and-working-blobsclobs-explained​​ 

现代LZ算法

  ​​https://glinscott.github.io/lz/index.html​​

参考 

​​https://wiki.postgresql.org/wiki/TOAST​​

​​https://www.postgresql.eu/events/pgconfeu2019/sessions/session/2671/slides/263/Data_Compression_in_PostgreSQL_and_its_future_noscript.pdf​​

​​https://www.postgresql.org/docs/current/storage-toast.html​​



举报

相关推荐

0 条评论