​Greenplum 数据库是一种大规模并行处理(MPP)数据库服务器,其架构特别针对管理大规模分析型数据仓库以及商业智能工作负载而设计。

MPP(也被称为 shared nothing 架构)指有两个或者更多个处理器协同执行一个操作的系统,每一个处理器都有其自己的内存、操作系统和磁盘。

组件说明

highlevel_arch

Master

  • Greenplum 数据库的 Master 是整个 Greenplum 数据库系统的入口,它接受连接和 SQL 查询并且把工作分布到 Segment 实例上
  • Master 上不包含任何用户数据,数据只存在于 Segment 上
  • Master 会认证客户端连接、处理到来的 SQL 命令、在 Segment 之间分布工作负载、协调每一个 Segment 返回的结果以及把最终结果呈现给客户端程序
  • 收集 Segment 的执行结果

Segment

  • Greenplum 数据库的 Segment 实例是独立的 PostgreSQL 数据库,每一个都存储了数据的一部分并且执行查询处理的主要部分
  • 当一个用户通过 Greenplum 的 Master 连接到数据库并且发出一个查询时,在每一个 Segment 数据库上都会创建一些进程来处理该查询的工作
  • 一台 Segment 主机通常运行2至8个 Greenplum 的 Segment 实例
  • 执行 Master 分发的 SQL 语句
  • 负责对应数据的存储和计算

Interconnect

  • Interconect 是 Greenplum 数据库架构中的网络层
  • Interconnect 指的是 Segment 之间的进程间通信以及这种通信所依赖的网络基础设施
  • Interconnect 采用了一种标准的以太交换网络
  • 默认情况下,Interconnect 使用 UDP 在网络上发送消息

集群部署

版本说明

版本:Centos7.0 + greenplum-db-6.19.1-rhel7-x86_64.rpm

下载地址:https://github.com/greenplum-db/gpdb/releases

主机列表

采用 4 台机器组成的集群:1 台机器是 Master 节点,1 台机器是 Standby 节点(这个节点相当于 Master 的备用节点),2 台机器是 Segment 节点。

192.168.3.31   mdw   (Master  节点)
192.168.3.32   smdw  (Standby 节点)
192.168.3.33   sdw1  (Segment 节点)
192.168.3.34   sdw2  (Segment 节点)

环境初始化

以下配置步骤,所有服务器均需要进行操作。

  • 配置主机名及 IP 地址

greenplum-ip

  • 关闭防火墙

配置的时候,要保证所有的机器的网络都是通的,并且每台机器的防火墙都是关闭的,避免存在网络不通的情况。

[root@mdw ~]# systemctl stop firewalld    (停止防火墙)
[root@mdw ~]# systemctl disable firewalld (关闭防火墙)
[root@mdw ~]# systemctl status firewalld  (查看防火墙状态)
  • 关闭 SELinux
[root@mdw ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
[root@mdw ~]# setenforce 0
  • 安装依赖
[root@mdw ~]# yum install apr apr-util bash bzip2 curl krb5 libcurl libevent libxml2 libyaml zlib openldap openssh openssl openssl-libs perl readline rsync R sed tar zip -y
  • 配置文件连接数
[root@mdw ~]# vim /etc/security/limits.conf

* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072
  • 配置进程数
[root@mdw ~]# vim /etc/security/limits.d/20-nproc.conf

*          soft    nproc     131072
root       soft    nproc     unlimited
  • 修改内核配置参数
[root@mdw ~]# vim /etc/sysctl.conf

kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 500 2048000 200 40960
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 10000 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.swappiness = 10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296

配置完成后,执行 sysctl -p 使配置生效。

  • 添加 hosts 解析

主要是为之后 Greenplum 能够在各个节点之间相互通信做准备。

[root@mdw ~]# vim /etc/hosts

192.168.3.31   mdw
192.168.3.32   smdw
192.168.3.33   sdw1
192.168.3.34   sdw2
  • 设置 /etc/sysconfig/network

注:每个节点机器的设置内容不同。

[root@mdw ~]# vim /etc/sysconfig/network

# Created by anaconda
networking=yes
hostname=mdw
  • 禁用 THP
[root@mdw ~]# grubby --update-kernel=ALL --args="transparent_hugepage=never"
  • 禁用 RemoveIPC
[root@mdw ~]# vim /etc/systemd/logind.conf

RemoveIPC=no
  • 将所有服务器重启
[root@mdw ~]# init 6

GreenPlum 数据库安装

  • 添加组和用户(所有节点)

创建 gpadmin 用户及用户组,将其作为安装 Greenplum 的操作系统用户。

[root@mdw ~]# groupadd gpadmin
[root@mdw ~]# useradd -r -m -g gpadmin gpadmin
[root@mdw ~]# passwd gpadmin       # gpadmin.123
  • 集群相互之间 ssh 免密登录(Master 节点)
[root@mdw ~]# su - gpadmin

[gpadmin@mdw ~]$ cd /home/gpadmin/

# 生成密钥对(回车即可)
[gpadmin@mdw ~]$ ssh-keygen -t rsa

# 需输入对应主机的密码
[gpadmin@mdw ~]$ ssh-copy-id smdw

[gpadmin@mdw ~]$ ssh-copy-id sdw1

[gpadmin@mdw ~]$ ssh-copy-id sdw2

greenplum-ssh2

验证如下:在 Master 节点 gpadmin 用户下,ssh sdw1 输入后,无需密码直接登录到了 sdw1 服务器。

[gpadmin@mdw ~]$ ssh sdw1
Last login: Sun Jul 31 19:21:57 2022 from mdw

[gpadmin@sdw1 ~]$
  • 安装 Greenplum 的 rpm 包(Master 节点)
[root@mdw ~]# cd /usr/local/

[root@mdw local]# sudo yum install ./open-source-greenplum-db-6.19.1-rhel7-x86_64.rpm

# 将这个文件在 gpadmin 用户赋予权限
[root@mdw local]# chown -R gpadmin:gpadmin /usr/local/greenplum*

greenplum-install

  • 创建 hostlist,包含所有节点主机名(Master 节点)
[root@mdw ~]# su - gpadmin
[root@mdw ~]# mkdir /home/gpadmin/conf
[root@mdw ~]# cd /home/gpadmin/conf

[gpadmin@mdw conf]$ vim hostlist

mdw
smdw
sdw1
sdw2
  • 创建 seg_hosts ,包含所有的 Segment Host 的主机名(Master 节点)
[gpadmin@mdw conf]$ vim seg_hosts

sdw1
sdw2
  • 使用 gpssh-exkeys 打通所有服务器(Master 节点)
[root@mdw ~]# su - gpadmin

[gpadmin@mdw ~]$ cd /usr/local/greenplum-db-6.19.1/

[gpadmin@mdw greenplum-db-6.19.1]$ source /usr/local/greenplum-db-6.19.1/greenplum_path.sh

[gpadmin@mdw greenplum-db-6.19.1]$ cd /home/gpadmin/conf

[gpadmin@mdw conf]$ gpssh-exkeys -f hostlist
[STEP 1 of 5] create local ID and authorize on local host
  ... /home/gpadmin/.ssh/id_rsa file exists ... key generation skipped

[STEP 2 of 5] keyscan all hosts and update known_hosts file

[STEP 3 of 5] retrieving credentials from remote hosts
  ... send to smdw
  ... send to sdw1
  ... send to sdw2

[STEP 4 of 5] determine common authentication file content

[STEP 5 of 5] copy authentication files to all remote hosts
  ... finished key exchange with smdw
  ... finished key exchange with sdw1
  ... finished key exchange with sdw2

[INFO] completed successfully
  • Standby 节点上安装 Greenplum
[root@smdw ~]# hostname
smdw

[root@smdw ~]# cd /usr/local/

[root@smdw local]# sudo yum install ./open-source-greenplum-db-6.19.1-rhel7-x86_64.rpm

# 将这个文件在 gpadmin 用户赋予权限
[root@smdw local]# chown -R gpadmin:gpadmin /usr/local/greenplum*
  • Segment 节点上安装 Greenplum(共 2 个节点)
[root@sdw1 ~]# hostname
sdw1

[root@sdw1 ~]# cd /usr/local/

[root@sdw1 local]# sudo yum install ./open-source-greenplum-db-6.19.1-rhel7-x86_64.rpm

# 将这个文件在 gpadmin 用户赋予权限
[root@sdw1 local]# chown -R gpadmin:gpadmin /usr/local/greenplum*
  • 环境变量配置 .bash_profile(所有节点)
[root@mdw ~]# su - gpadmin

[gpadmin@mdw ~]$ vim /home/gpadmin/.bash_profile

source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/home/gpadmin/data/master/gpseg-1
export PGPORT=5432
export PGUSER=gpadmin 
export PGDATABASE=gp_sydb
export LD_PRELOAD=/lib64/libz.so.1 ps

[gpadmin@mdw ~]$ source /home/gpadmin/.bash_profile
  • 配置环境变量 GPHOME(所有节点)
[gpadmin@mdw ~]$ vim /usr/local/greenplum-db/greenplum_path.sh

# 添加如下路径
GPHOME=/usr/local/greenplum-db

greenplum-gphome

  • 创建数据文件夹(Master 节点)

创建各个节点的数据文件夹,该步骤之后 hostlist 文件下所包含的机器都会创建 data 目录,data 目录下都会创建 master、primary、mirror 文件夹。

[gpadmin@mdw ~]$ cd /usr/local/greenplum-db

gpssh -f /home/gpadmin/conf/hostlist

mkdir data
cd data
mkdir master
mkdir primary
mkdir mirror

greenplum-mkdir

  • 连通性检查
[gpadmin@mdw greenplum-db]$ gpcheckperf -f /home/gpadmin/conf/hostlist -r N -d /tmp

/usr/local/greenplum-db-6.19.1/bin/gpcheckperf -f /home/gpadmin/conf/hostlist -r N -d /tmp

-------------------
--  NETPERF TEST
-------------------


====================
==  RESULT 2022-07-31T20:03:03.635228
====================
Netperf bisection bandwidth test
mdw -> smdw = 500.910000
sdw1 -> sdw2 = 500.540000
smdw -> mdw = 494.230000
sdw2 -> sdw1 = 492.510000

Summary:
sum = 1988.19 MB/sec
min = 492.51 MB/sec
max = 500.91 MB/sec
avg = 497.05 MB/sec
median = 500.54 MB/sec

出现以上内容证明各个节点已经可以连通。

  • 初始化配置文件编辑修改(Master 节点)

修改初始化文件,首先拷贝一个文件 gpinitsystem_config,再进行修改。

[gpadmin@mdw greenplum-db]$ mkdir /home/gpadmin/gpconfigs

[gpadmin@mdw greenplum-db]$ cp /usr/local/greenplum-db/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfigs/gpinitsystem_config

[gpadmin@mdw ~]$ vim /home/gpadmin/gpconfigs/gpinitsystem_config

# 该项配置 primary 数据存储位置,括号里边有几个就代表每台 Segment 主机创建几个 postgresql 数据库实例
declare -a DATA_DIRECTORY=(/home/gpadmin/data/primary /home/gpadmin/data/primary)

# 该项配置设置 Master 节点机器名字
MASTER_HOSTNAME=mdw

# 该项配置设置 Master 节点数据存储位置
MASTER_DIRECTORY=/home/gpadmin/data/master

# 该项配置 mirror 数据存储位置,规则同 DATA_DIRECTORY,括号里边数量和 DATA_DIRECTORY 保持一致
declare -a MIRROR_DATA_DIRECTORY=(/home/gpadmin/data/mirror /home/gpadmin/data/mirror)

# 该项配置设置默认数据库名字,和环境变量数据库名字保持一致,不然会失败
DATABASE_NAME=gp_sydb

在 /home/gpadmin/gpconfigs 新增一个配置文件 hostfile_gpinitsystem

[gpadmin@mdw ~]$ cd /home/gpadmin/gpconfigs

[gpadmin@mdw gpconfigs]$ vim hostfile_gpinitsystem

# 新增如下内容
sdw1
sdw2
  • 初始化数据库
[gpadmin@mdw ~]$ gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem

20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Checking new segment hosts, Completed
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Greenplum Database Creation Parameters
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:---------------------------------------
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master Configuration
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:---------------------------------------
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master instance name       = Greenplum Data Platform
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master hostname            = mdw
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master port                = 5432
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master instance dir        = /home/gpadmin/data/master/gpseg-1
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master LOCALE              = en_US.utf8
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Greenplum segment prefix   = gpseg
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master Database            = gp_sydb
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master connections         = 250
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master buffers             = 128000kB
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Segment connections        = 750
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Segment buffers            = 128000kB
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Checkpoint segments        = 8
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Encoding                   = UNICODE
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Postgres param file        = Off
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Initdb to be used          = /usr/local/greenplum-db-6.19.1/bin/initdb
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-GP_LIBRARY_PATH is         = /usr/local/greenplum-db-6.19.1/lib
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-HEAP_CHECKSUM is           = on
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-HBA_HOSTNAMES is           = 0
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Ulimit check               = Passed
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Array host connect type    = Single hostname per node
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master IP address [1]      = ::1
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master IP address [2]      = 192.168.122.1
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master IP address [3]      = 192.168.3.31
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Master IP address [4]      = fe80::2e1e:864b:fbd:fa96
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Standby Master             = Not Configured
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Number of primary segments = 2
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Total Database segments    = 4
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Trusted shell              = ssh
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Number segment hosts       = 2
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Mirroring config           = OFF
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:----------------------------------------
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Greenplum Primary Segment Configuration
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:----------------------------------------
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-sdw1  6000    sdw1    /home/gpadmin/data/primary/gpseg0       2
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-sdw1  6001    sdw1    /home/gpadmin/data/primary/gpseg1       3
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-sdw2  6000    sdw2    /home/gpadmin/data/primary/gpseg2       4
20220731:20:18:41:008655 gpinitsystem:mdw:gpadmin-[INFO]:-sdw2  6001    sdw2    /home/gpadmin/data/primary/gpseg3       5

Continue with Greenplum creation Yy|Nn (default=N):
> y
20220731:20:18:45:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Building the Master instance database, please wait...
20220731:20:18:49:008655 gpinitsystem:mdw:gpadmin-[INFO]:-Starting the Master in admin mode

出现如下内容,则表示安装成功:

greenplum-initial

注:若初始化失败,需要删除 /home/gpadmin/data 下的数据资源目录重新初始化。

  • 设置远程登录(Master 节点)

如需要 Navicat、DBeaver 等工具能够链接,需要如下配置:

[gpadmin@mdw ~]$ echo "host all gpadmin 0.0.0.0/0 trust" >> /home/gpadmin/data/master/gpseg-1/pg_hba.conf

# 重新加载数据库配置
gpstop -u 

注:gpstop -u 的意思是将 pg_hba.conf 文件重新上传并运行 Master 中的 postgresql.conf 文件的参数,期间不会中断服务。已建立连接的用户将不会使用最新配置,重连后将应用最新。起到了使刚刚修改的 pg_hba.conf 配置文件生效的作用。

  • 设置 Standby 节点(Master 节点)
[gpadmin@mdw ~]$ gpinitstandby -s smdw
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Checking for data directory /home/gpadmin/data/master/gpseg-1 on smdw
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master hostname               = mdw
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master data directory         = /home/gpadmin/data/master/gpseg-1
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master port                   = 5432
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master hostname       = smdw
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master port           = 5432
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master data directory = /home/gpadmin/data/master/gpseg-1
20220731:21:15:16:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum update system catalog         = On
Do you want to continue with standby master initialization? Yy|Nn (default=N):
> y
20220731:21:15:24:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
20220731:21:15:25:013189 gpinitstandby:mdw:gpadmin-[INFO]:-The packages on smdw are consistent.
20220731:21:15:25:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Adding standby master to catalog...
20220731:21:15:25:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Database catalog updated successfully.
20220731:21:15:25:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Updating pg_hba.conf file...
20220731:21:15:26:013189 gpinitstandby:mdw:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
20220731:21:15:28:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Starting standby master
20220731:21:15:28:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw  in directory: /home/gpadmin/data/master/gpseg-1
20220731:21:15:31:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
20220731:21:15:32:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
20220731:21:15:32:013189 gpinitstandby:mdw:gpadmin-[INFO]:-Successfully created standby master on smdw

验证 Standby 节点是否完成配置,在 Standby 的服务器上,验证相关端口是否运行:

[gpadmin@smdw ~]$ hostname
smdw

[gpadmin@smdw ~]$ ps -ef|grep postgres
gpadmin   56777      1  1 21:15 ?        00:00:01 /usr/local/greenplum-db-6.19.1/bin/postgres -D /home/gpadmin/data/master/gpseg-1 -p 5432 -E
gpadmin   56845  56777  0 21:15 ?        00:00:00 postgres:  5432, master logger process
gpadmin   56846  56777  0 21:15 ?        00:00:00 postgres:  5432, startup process   waiting for 000000010000000000000003
gpadmin   56847  56777  0 21:15 ?        00:00:00 postgres:  5432, checkpointer process
gpadmin   56848  56777  0 21:15 ?        00:00:00 postgres:  5432, writer process
gpadmin   56849  56777  0 21:15 ?        00:00:00 postgres:  5432, wal receiver process
gpadmin   56951  54578  0 21:16 pts/0    00:00:00 grep --color=auto postgres

至此,GP 集群安装完成,并设置好了 Standby。

Greenplum 数据库验证

  • 登录验证(Master 节点)
[gpadmin@mdw ~]$ psql -d postgres
psql (9.4.26)
Type "help" for help.

# 查询数据库
postgres=# \l
                               List of databases
   Name    |  Owner  | Encoding |  Collate   |   Ctype    |  Access privileges
-----------+---------+----------+------------+------------+---------------------
 gp_sydb   | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 |
 postgres  | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | =c/gpadmin         +
           |         |          |            |            | gpadmin=CTc/gpadmin
 template1 | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | =c/gpadmin         +
           |         |          |            |            | gpadmin=CTc/gpadmin
(4 rows)

# 切换数据库
postgres=# \c dbname

数据库操作

停止和启动集群(Master 节点)

  • 交互式关闭,等待所有数据库连接断开才停库
[gpadmin@mdw ~]$ gpstop
  • 交互式关闭,强制断开所有数据库连接,直接停库
[gpadmin@mdw ~]$ gpstop -M fast
  • 交互式启动
[gpadmin@mdw ~]$ gpstart
  • 重启
[gpadmin@mdw ~]$ gpstop -ar

集群状态(Master 节点)

  • 查看整个 GP 集群的状态(显示所有 Master 和 Segment 信息)
[gpadmin@mdw ~]$ gpstate -s
  • 查看备用主机详细信息
[gpadmin@mdw ~]$ gpstate -f
  • 显示 primary 和 mirror 的映射关系
[gpadmin@mdw ~]$ gpstate -c
  • 显示 mirror 同步状态
[gpadmin@mdw ~]$ gpstate -m
  • 查看 GP 的版本
[gpadmin@mdw ~]$ gpstate -i

添加 mirror

新安装的 Greenplum 集群只有 primary 节点,没有 mirror 节点。高可用性没得到保证。所以就需要为集群添加 mirror 节点。

注:数据量过大时添加 mirror,会有较大的磁盘压力,因为后台会一直同步数据。

  • 查看系统配置表,只有 primary 节点(Master 节点)
[gpadmin@mdw ~]$ psql -d postgres
psql (9.4.26)
Type "help" for help.

postgres=# SELECT * from gp_segment_configuration ;
 dbid | content | role | preferred_role | mode | status | port | hostname | address |              datadir
------+---------+------+----------------+------+--------+------+----------+---------+-----------------------------------
    1 |      -1 | p    | p              | n    | u      | 5432 | mdw      | mdw     | /home/gpadmin/data/master/gpseg-1
    2 |       0 | p    | p              | n    | u      | 6000 | sdw1     | sdw1    | /home/gpadmin/data/primary/gpseg0
    4 |       2 | p    | p              | n    | u      | 6000 | sdw2     | sdw2    | /home/gpadmin/data/primary/gpseg2
    3 |       1 | p    | p              | n    | u      | 6001 | sdw1     | sdw1    | /home/gpadmin/data/primary/gpseg1
    5 |       3 | p    | p              | n    | u      | 6001 | sdw2     | sdw2    | /home/gpadmin/data/primary/gpseg3
    6 |      -1 | m    | m              | s    | u      | 5432 | smdw     | smdw    | /home/gpadmin/data/master/gpseg-1
(6 rows)

创建数据目录

  • 在所有需要添加 mirror 的主机,创建存放 mirror 的数据目录,上述步骤中已创建(Master 节点):
[gpadmin@sdw1 ~]$ hostname
sdw1

[gpadmin@sdw1 ~]$ ll /home/gpadmin/data/
总用量 0
drwxrwxr-x 2 gpadmin gpadmin  6 8月   6 11:02 master
drwxrwxr-x 2 gpadmin gpadmin  6 8月   6 11:02 mirror
drwxrwxr-x 4 gpadmin gpadmin 34 8月   6 11:29 primary

生成 addmirror 文件

  • 执行 gpaddmirrors 命令,键入存放 mirror 节点的日志,生成配置文件(Master 节点):
[gpadmin@mdw ~]$ gpaddmirrors -o ./addmirror

20220806:12:28:59:060581 gpaddmirrors:mdw:gpadmin-[INFO]:-Starting gpaddmirrors with args: -o ./addmirror
20220806:12:28:59:060581 gpaddmirrors:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.19.1 build commit:0e314744a460630073b46cea7b7cf20a81e3da63 Open Source'
20220806:12:28:59:060581 gpaddmirrors:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.19.1 build commit:0e314744a460630073b46cea7b7cf20a81e3da63 Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Jan 18 2022 13:51:20'
20220806:12:28:59:060581 gpaddmirrors:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20220806:12:29:00:060581 gpaddmirrors:mdw:gpadmin-[INFO]:-Heap checksum setting consistent across cluster
Enter mirror segment data directory location 1 of 2 >
/home/gpadmin/data/mirror
Enter mirror segment data directory location 2 of 2 >
/home/gpadmin/data/mirror
20220806:12:29:25:060581 gpaddmirrors:mdw:gpadmin-[INFO]:-Configuration file output to ./addmirror successfully.
  • 查看文件内容(Master 节点):
[gpadmin@mdw ~]$ cat addmirror

0|sdw2|7000|/home/gpadmin/data/mirror/gpseg0
1|sdw2|7001|/home/gpadmin/data/mirror/gpseg1
2|sdw1|7000|/home/gpadmin/data/mirror/gpseg2
3|sdw1|7001|/home/gpadmin/data/mirror/gpseg3

执行添加命令

[gpadmin@mdw ~]$ gpaddmirrors -i addmirror

20220810:16:22:04:009304 gpaddmirrors:mdw:gpadmin-[INFO]:-Starting to create new pg_hba.conf on primary segments
20220810:16:22:05:009304 gpaddmirrors:mdw:gpadmin-[INFO]:-Successfully modified pg_hba.conf on primary segments to allow replication connections
20220810:16:22:05:009304 gpaddmirrors:mdw:gpadmin-[INFO]:-4 segment(s) to add
20220810:16:22:05:009304 gpaddmirrors:mdw:gpadmin-[INFO]:-Validating remote directories
20220810:16:22:06:009304 gpaddmirrors:mdw:gpadmin-[INFO]:-Configuring new segments

查看节点状态

  • 执行 gpstate -m 就可以看到 Data Status 的状态是 Synchronized(已同步的):
[gpadmin@mdw primary]$ gpstate -m

20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.19.1 build commit:0e314744a460630073b46cea7b7cf20a81e3da63 Open Source'
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.19.1 build commit:0e314744a460630073b46cea7b7cf20a81e3da63 Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Jan 18 2022 13:51:20'
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:--Type = Group
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-   Mirror   Datadir                            Port   Status    Data Status
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /home/gpadmin/data/mirror/gpseg0   7000   Passive   Synchronized
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /home/gpadmin/data/mirror/gpseg1   7001   Passive   Synchronized
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /home/gpadmin/data/mirror/gpseg2   7000   Passive   Synchronized
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /home/gpadmin/data/mirror/gpseg3   7001   Passive   Synchronized
20220806:12:43:02:061084 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
  • 查看系统配置表:
[gpadmin@mdw ~]$ psql -d postgres
psql (9.4.26)
Type "help" for help.

postgres=# SELECT * from gp_segment_configuration ;
 dbid | content | role | preferred_role | mode | status | port | hostname | address |              datadir
------+---------+------+----------------+------+--------+------+----------+---------+-----------------------------------
    1 |      -1 | p    | p              | n    | u      | 5432 | mdw      | mdw     | /home/gpadmin/data/master/gpseg-1
    2 |       0 | p    | p              | n    | u      | 6000 | sdw1     | sdw1    | /home/gpadmin/data/primary/gpseg0
    5 |       3 | p    | p              | n    | u      | 6001 | sdw2     | sdw2    | /home/gpadmin/data/primary/gpseg3
    6 |      -1 | m    | m              | s    | u      | 5432 | smdw     | smdw    | /home/gpadmin/data/master/gpseg-1
    7 |       0 | m    | m              | n    | d      | 7000 | sdw2     | sdw2    | /home/gpadmin/data/mirror/gpseg0
   10 |       3 | m    | m              | n    | d      | 7001 | sdw1     | sdw1    | /home/gpadmin/data/mirror/gpseg3
    4 |       2 | p    | p              | s    | u      | 6000 | sdw2     | sdw2    | /home/gpadmin/data/primary/gpseg2
    9 |       2 | m    | m              | s    | u      | 7000 | sdw1     | sdw1    | /home/gpadmin/data/mirror/gpseg2
    3 |       1 | p    | p              | s    | u      | 6001 | sdw1     | sdw1    | /home/gpadmin/data/primary/gpseg1
    8 |       1 | m    | m              | s    | u      | 7001 | sdw2     | sdw2    | /home/gpadmin/data/mirror/gpseg1
(10 rows)

常见问题及解决方法

  • 使用 gpstate -s 查看整个 GP 集群的状态,报如下错误:
[gpadmin@mdw ~]$ gpstate -s
20220731:20:44:49:012783 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -s
20220731:20:44:49:012783 gpstate:mdw:gpadmin-[CRITICAL]:-gpstate failed. (Reason='[Errno 2] No such file or directory: '/data/master/gpseg-1/postgresql.conf'') exiting...

解决方案: 检查环境环境变量配置

[gpadmin@mdw ~]$ cat /home/gpadmin/.bash_profile

export PATH
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/master/gpseg-1
export PGPORT=5432
export PGUSER=gpadmin
export PGDATABASE=gp_sydb
export LD_PRELOAD=/lib64/libz.so.1 ps

其中 MASTER_DATA_DIRECTORY 路径配置错误,应调整为(所有节点都需要同步调整):

export MASTER_DATA_DIRECTORY=/home/gpadmin/data/master/gpseg-1

参考资料

VMware 安装 Centos7

使用 VMnet8 模式给虚拟网络配置静态IP

https://blog.csdn.net/msh6453/article/details/124892605

https://blog.51cto.com/feko/2738318

http://events.jianshu.io/p/e0d70fd545dc

https://www.bilibili.com/video/BV1XJ411L7Gv

原创文章,转载请注明出处:http://www.opcoder.cn/article/59/