上节已经讲过,Postgres-XL是多个组件的集合。初次安装需要一些工作。在本节中,我们将展示如何从空配置文件开始,使用pgxc_ctl从头创建Postgres-XL集群。

安装之前,需要先确保机器满足一些先决条件。

  • 要运行pgxc_ctl的节点需要支持无密码ssh访问。

  • 在所有机器上,正确设置PATH环境变量包含Postgres-XL数据文件,特别是在通过ssh运行命令时。

  • 必须配置pg_hba.conf允许远程访问。 pgxc_ctl.conf配置文件中诸如coordPgHbaEntries和datanodePgHbaEntries都可能需要适当的更改。

  • 配置防火墙和iptables使某些端口可以正常访问。

如果没有安装pgxc_ctl,可以从源代码编译并安装。

$ cd $XLSRC/contrib/pgxc_ctl
$ make install

我们现在开始准备模板配置文件。 pgxc_ctl提供三种类型的配置。我们选择空配置,从头创建我们的Postgres-XL设置。请注意,为正确调用pgxc_ctl,需设置dataDirRoot环境变量。

$ export dataDirRoot = $HOME/DATA/pgxl/nodes
$ mkdir $HOME/pgxc_ctl
$ pgxc_ctl
Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash.
Reading configuration using /Users/postgres/pgxc_ctl/pgxc_ctl_bash --home
/Users/postgres/pgxc_ctl --configuration
/Users/postgres/pgxc_ctl/pgxc_ctl.conf
Finished reading configuration.
   ******** PGXC_CTL START ***************

   Current directory: /Users/postgres/pgxc_ctl

PGXC$ prepare config empty
PGXC$ exit

空配置文件现在已经准备就绪。你应该配置下pgxc_ctl.conf。至少应该设置pgxcOwner。配置文件包含USERi和HOME环境变量,以允许当前用户的默认访问。

下一步来安装GTM。

$ pgxc_ctl
PGXC$ add gtm master gtm localhost 20001 $dataDirRoot/gtm

使用“monitor”命令检查集群的状态。

$ pgxc_ctl
PGXC$  monitor all
Running: gtm master

现在让我们添加几个coordinators。当第一个coordinators被添加时,它刚启动。当添加另一个时,它会连接到现有的coordinators节点获取有关对象的元数据信息。

PGXC$  add coordinator master coord1 localhost 30001 30011 $dataDirRoot/coord_master.1 none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
PGXC$  add coordinator master coord2 localhost 30002 30012 $dataDirRoot/coord_master.2 none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2

继续添加几个datanode,当添加第一个datanode时,它会连接coordinator以获取全局元数据。当继续添加其它的datanode时,它连接到现有的datanode节点获取元数据信息。

PGXC$  add datanode master dn1 localhost 40001 40011 $dataDirRoot/dn_master.1 none none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
PGXC$  add datanode master dn2 localhost 40002 40012 $dataDirRoot/dn_master.2 none none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2

现在Postgres-XL已经安装完成,你可以跳过下面部分继续阅读下一章节

如果你想快速学习下Postgres-XL的各种命令,那么继续阅读。强烈建议您阅读整个文档,了解下面介绍的每个命令的更多详细信息。

连接到一个coordinator并创建一个测试数据库。

$psql -p 30001 postgres

postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# \q

查看pgxc_node目录。它会显示所有配置的节点。有负节点ID值是正常的。这将很快被修复。

$  psql -p 30001 testdb
testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
(4 rows)

创建一个分发表,按照第一列hash进行分发

testdb=# CREATE TABLE disttab(col1 int, col2 int, col3 text) DISTRIBUTE BY HASH(col1);
CREATE TABLE
testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: ALL DATANODES

创建一个复制表

testdb=# CREATE TABLE repltab (col1 int, col2 int) DISTRIBUTE BY
REPLICATION;
CREATE TABLE
testdb=# \d+ repltab
                       Table "public.repltab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 col1   | integer |           | plain   |              | 
 col2   | integer |           | plain   |              | 
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: ALL DATANODES

往表中插入一些数据

testdb=# INSERT INTO disttab SELECT generate_series(1,100), generate_series(101, 200), 'foo';
INSERT 0 100
testdb=# INSERT INTO repltab SELECT generate_series(1,100), generate_series(101, 200);
INSERT 0 100

现在,分发表中应该包含100行记录

testdb=# SELECT count(*) FROM disttab;
 count 
-------
   100
(1 row)

这些数据不会都分布在同一个datanode上,xc_node_id是datanode节点标识,请注意,由于HASH功能,分布可能略有不均匀

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)

对于复制表,我们希望所有的数据行都来自同一个datanode(虽然在其它节点会有相同的副本)

testdb=# SELECT count(*) FROM repltab;
 count 
-------
   100
(1 row)

testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |   100
(1 row)

新添加一个datanode

PGXC$  add datanode master dn3 localhost 40003 40013 $dataDirRoot/dn_master.3 none none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3

请注意,在集群重新配置期间,所有未完成的事务将中止,会话将重置。 所以你可能会看到这样的错误

testdb=# SELECT * FROM pgxc_node;
ERROR:  canceling statement due to user request             <==== pgxc_pool_reload() resets all sessions and aborts all open transactions

testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
 dn3       | D         |     40003 | localhost | f              | f                |  -700122826
(5 rows)

请注意,添加了新的datanode,现有表不受影响。 分发信息仍只存在较旧的datanode中

testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: dn1, dn2

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)

testdb=# \d+ repltab
                       Table "public.repltab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 col1   | integer |           | plain   |              | 
 col2   | integer |           | plain   |              | 
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: dn1, dn2

为了充分利用新的datanode,我们重新分发数据表

testdb=# ALTER TABLE disttab ADD NODE (dn3);
ALTER TABLE
testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: ALL DATANODES

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -700122826 |    32
  352366662 |    32
 -560021589 |    36
(3 rows)

添加一个coordinator.

PGXC$  add coordinator master coord3 localhost 30003 30013 $dataDirRoot/coord_master.3 none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: coordinator master coord3
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3

testdb=# SELECT * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
 dn3       | D         |     40003 | localhost | f              | f                |  -700122826
 coord3    | C         |     30003 | localhost | f              | f                |  1638403545
(6 rows)

我们可以用ALTER TABLE操作删除分发表的datanode节点,之后再添加回来

testdb=# ALTER TABLE disttab DELETE NODE (dn1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
  352366662 |    42
 -700122826 |    58
(2 rows)

testdb=# ALTER TABLE disttab ADD NODE (dn1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -700122826 |    32
  352366662 |    32
 -560021589 |    36
(3 rows)

你甚至可以将一个复制表修改为分发表,请注意,即使集群现在已经有3个datanodes,表将继续仅使用最初复制表的两个datanode。

testdb=# ALTER TABLE repltab DISTRIBUTE BY HASH(col1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)

testdb=# ALTER TABLE repltab DISTRIBUTE BY REPLICATION;
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |   100
(1 row)

移除之前添加的coordinator。 你也可以使用“clean”选项删除相应的数据目录。

PGXC$  remove coordinator master coord3 clean
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3

testdb=# SELECT oid, * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16385 | dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
 16397 | dn3       | D         |     40003 | localhost | f              | f                |  -700122826
(5 rows)

现在让我们尝试删除一个datanode。 注意:Postgres-XL不会检查要删除的datanode是否包好复制/分发表的数据。 用户有责任保证删除datanode的操作是安全的。 您可以使用以下查询是否要删除的datanode有任何数据。 请注意,这仅显示来自当前datanode的表。 在执行datanode删除之前,您可能希望确保所有数据库都相同。 你可以用要删除的datanode的OID作如下查询:

testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
 pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets |     nodeoids      
---------+---------------+----------+-----------------+---------------+-------------------
   16388 | H             |        1 |               1 |          4096 | 16385 16386 16397
(1 row)


testdb=# ALTER TABLE disttab DELETE NODE (dn3);
ALTER TABLE
testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
 pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets | nodeoids 
---------+---------------+----------+-----------------+---------------+----------
(0 rows)

好,现在可以安全的删除datanode "dn3"

PGXC$  remove datanode master dn3 clean
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2

testdb=# SELECT oid, * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16385 | dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
(4 rows)

pgxc_ctl还可以为datanodes和coordinators配置备份。 让我们为datanode设置一个备份,看看在主datanode失败的情况下,slave datanode是如何应对的。

PGXC$  add datanode slave dn1 localhost 40101 40111 $dataDirRoot/dn_slave.1 none $dataDirRoot/datanode_archlog.1
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode slave dn1
Running: datanode master dn2

testdb=# EXECUTE DIRECT ON(dn1) 'SELECT client_hostname, state, sync_state FROM pg_stat_replication';
 client_hostname |   state   | sync_state 
-----------------+-----------+------------
                 | streaming | async
(1 row)

添加一些行以便测试。

testdb=# INSERT INTO disttab SELECT generate_series(1001,1100), generate_series(1101, 1200), 'foo';
INSERT 0 100
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    94
  352366662 |   106
(2 rows)

现在开始模拟。 我们首先停止上面配置的master datanode “dn1”。 注意,由于slave连接到master,我们使用“immediate”模式来停止它。

PGXC$  stop -m immediate datanode master dn1

由于datanode关闭,查询将失败。 取决于表的数据分布和所使用的WHERE子句(关闭的datanode不需要执行查询操作),有些查询仍然可以工作。

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR:  Failed to get pooled connections

testdb=# SELECT xc_node_id, * FROM disttab WHERE col1 = 3;
 xc_node_id | col1 | col2 | col3 
------------+------+------+------
  352366662 |    3 |  103 | foo
(1 row)

我们现在操作故障切换确保一切都能正常工作。

PGXC$  failover datanode dn1

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR:  canceling statement due to user request
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    94
  352366662 |   106
(2 rows)

pgxc_node应该包含更新的条目。 特别是,失败的datanode node_host和node_port应该被替换为slave的主机和端口值。

testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
 16385 | dn1       | D         |     40101 | localhost | t              | t                |  -560021589
(4 rows)

PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2

results matching ""

    No results matching ""