DEVOPS

Need Kubernetes storage safer and faster? Let’s talk about Gluster! (Part II)

In the first part of the article, we’ve concluded GlusterFS is the perfect solution for our storage needs. But… wait. All our microservices are in Kubernetes! We have to find a way to implement GlusterFS in Kubernetes so we can use it as persistent storage in our microservices.

Thankfully, GlusterFS is Kubernetes compatible!

GlusterFS in Kubernetes

As previously said, GlusterFS is fully compatible with a native storage service onto any Kubernetes cluster.
So, the only thing we have to do is deploy Gluster in Kubernetes. But before diving in GlusterFS installation, let’s talk about persistent storage in Kubernetes.  

Persistent Storage in Kubernetes

In Kubernetes, Docker containers (pods) are managed using deployments or replication controllers. Kubernetes makes sure all your pods are always running, but when a pod is restarted or its deployment is deleted, all data created inside a pod is lost.

In cases when data is static, there’s no need to persist data (for example, static websites). But… what if we have dynamic data (databases, dynamic websites, etc.). What can we do in order to persist the data? Let’s call our friends PersistentVolumes!

  • A PersistentVolume (PV) is a storage piece in a cluster provisioned by an admin. PVs have an independent lifecycle, so they can be configured to not disappear if they’re attached to a removed pod.
  • A PersistentVolumeClaim (PVC) is a request for storage performed by a user. Users can request a specific size and access modes (read, read and write, etc).

There are two ways in which PVs can be provisioned: static or dynamic.

  • Static: admin creates PersistentVolumes (with their sizes and properties) and then either indicates in the PVC its corresponding PV or PVC chooses an available PV which satisfies its needs.
  • Dynamic: if there isn’t any PV to meet PVC needs, Kubernetes tries to create a PV with the requirements specified by the PVC. If it’s possible – there are enough resources in cluster- this PV will attach to the PVC.

In order to define storage types, we use a StorageClass (SC) in Kubernetes. PVC requires a SC in order to provision the PV.

Deploy GlusterFS in Kubernetes

To deploy GlusterFS in Kubernetes, we will use files provided in its official repositoryIn our case, we’re going to install Gluster cluster in Kubernetes nodes (bare-metal CentOS 7 machines), so all details will be according to that kind of installation.

Before proceeding to installation, we have to check if our Kubernetes cluster meets GlusterFS needs:

  • Have at least three Kubernetes nodes.
  • Have the same devices on all nodes.
  • The following ports must be opened: 2222 (GlusterFS pod’s sshd), 24007 (GlusterFS daemon), 24008 (GlusterFS management), 49152-49251 (for every brick in volume).
  • The following kernel modules must be loaded: dm_snapshot, dm_mirror, dm_thin_pool. We can load a kernel module with the following command:
    > sudo modprobe [kernel module]
  • Packages glusterfs and glusterfs-client installed in nodes.
  • Module  virt_sandbox_use_fusefs must be enabled in all nodes.
    > setsebool -P virt_sandbox_use_fusefs on

First, we have to define GlusterFS cluster topology (nodes available in GlusterFS cluster and the block devices attached to them). Heketi will partition and format all devices specified in topology.

The previous topology defines one cluster with 3 nodes. We can define multiple clusters in the topology file. All keys in json are self-explanatory, except “zone”. Zones are failure domains (set of nodes which share the same switch, power supply, or something that would cause them to fail at the same time). If we define failure domains, Heketi makes sure that replicas are created across failure domains, preventing data loss.

Before running the installation, we have to mount GlusterFS volume on all nodes:

> sudo mount -t glusterfs

It’s important to initialize all GlusterFS topology disks or partitions for use by LVM:

> sudo pvcreate {partition}

We can configure some properties (authorization, rebalance, etc) of heketi in heketi.json (we can find an example in deploy/heketi.json.template). In our case we’re not going to change anything, so we only copy the content of heketi.json.template to a file called heketi.json in the same folder.

After all these steps we can now run (in Kubernetes master node) script gk-deploy in deploy folder with the topology file as a parameter.

> ./gk-deploy -gvy topology.json

There are multiple parameters for the script, in our case:

  • g: deploy GlusterFS cluster in the nodes specified by topology.json.
  • v: verbose – to print all information about the process.
  • y: skip the pre-requisites prompt.

And… boom! GlusterFS cluster with Heketi deployed.

What has the script done?

  1. Label GlusterFS nodes with storagenode=glusterfs.
  2. Create a Service Account for heketi to securely communicate with GlusterFS nodes.
  3. Deploy GlusterFS DaemonSet (that means a glusterfs pod in every Kubernetes node configured with storagenode=glusterfs).
  4. Deploy a pod called ‘deploy-heketi’ used to provision heketi database. Database will use GlusterFS as persistent Storage.
  5. Create Service and Endpoints to communicate with GlusterFS cluster.
  6. Initialize the Heketi database.
  7. Create heketi deployment and service to be used for GlusterFS volume management.

Once everything is deployed, we have to create a StorageClass to allow Kubernetes to create volumes in GlusterFS.

  • resturl: URL to communicate to Heketi.
  • volumetype: (optional) GlusterFS volume types we want to provision
    • replicate:3 – Replicated with factor 3 of replication.
    • none – Distributed.
    • disperse:4:2 – Disperse, where ‘4’ is data and ‘2’ is the redundancy count.

Default value is replicate:3.

In case we have defined authorization in heketi.json, we have to add the following properties:

  • restuser: user.
  • secretnamespace: namespace where heketi secret is.
  • secretname: heketi secret name.

In case we have defined multiple GlusterFS clusters in topology.json, we can specify the cluster we want to communicate within clusteridBut… what is really happening in the machine? The following diagram illustrates Kubernetes PV creation.

Let’s assume a user request a PV through its corresponding PVC. By specifying StorageClass in the PVC, we tell Kubernetes the provider to provision the PV (in our case, GlusterFS). StorageClass specifies the location of Heketi REST interface (as a Kubernetes service). Once we have created the PVC, Kubernetes request a PV through this service, and so through the pod.

When Heketi handles the request, it creates a logical volume in the machine, and then this volume is loaded into GlusterFS as a brick. Heketi performs that action in all 3 GlusterFS nodes. These bricks are then loaded into GlusterFS as a GlusterFS volume, and that volume is exposed to Kubernetes to be used as a PV.

Heketi cluster management

Now you have GlusterFS installed in your cluster and your applications with all your data safe thanks to PV. You begin to deploy more and more applications and more and more PV… And BOOM!  You have no space left in your GlusterFS cluster.

No worries! Thanks to Heketi you can manage your cluster. Heketi provides a tool called heketi-cli to perform lot of actions in your cluster. To use heketi-tool, you have to execute the commands inside Heketi pod.

# To get the name of Heketi pod

> export HEKETI=kubectl get pods -l glusterfs=heketi-pod -o go-template –template ‘{{range .items}}{{.metadata.name}}{{“\n”}}{{end}}’

# To execute Heketi commands

> kubectl exec $HEKETI — heketi-cli {{ COMMAND }}

What commands can I execute with heketi-cli? Most important are:

  • Load a new GlusterFS topology: Heketi checks which resource you have already allocated and adds to your cluster the new ones. If the command fails, the cluster remains in the previous state. This is the best way to add more devices in nodes in case you have run out of space in your cluster.

    > heketi-cli load topology –json=topology.json
  • Check cluster status: node status, device usage, bricks, etc.

    > heketi-cli topology info
  • Add/Delete/Disable/Enable a node from Heketi cluster.
    > heketi-cli node add \

               –zone={ZONE} \

               –cluster={CLUSTER_ID} \

               –management-host-name={MANAGE} \

               –storage-host-name={STORAGE}

> heketi-cli node delete/enable/disable {NODE_ID}*

* you can’t delete an enabled node. First, you have to disable.

  • List GlusterFS devices.
    > heketi-cli devices info
  • Add/Delete/Disable/Enable devices in GlusterFS nodes.
    > heketi-cli device add –name={DEVICE_NAME} –node={NODE_TO_ADD}
    > heketi-cli delete/enable/disable {DEVICE_ID}*

    * you can’t delete an enabled device. First, you have to disable.

Expanding GlusterFS volumes in Kubernetes

Imagine you deploy a database, and because you want all the data to be persisted after pod restarts, you attach a 10GB PV to your deployment.

Your application begins to write data to your database, and you find yourself in a situation where your application stops writing data. Why? Probably because the PersistentVolume attached to your database is full.

One option would be:

  1. Perform a data backup.
  2. Deattach PVC.
  3. Deploy a new PVC with more storage.
  4. Perform a data restore.
  5. Attach the new PVC to the database deployment.
  6. Restart the pod with the new PV.

That’s a nice option, but it will require downtime in your application. Not so nice now. Thankfully, since v1.9, Kubernetes added support for expanding GlusterFS PersistentVolumes. To use this feature, you’ve to enable some features in Kubernetes:

  • Set ExpandPersistentVolumes feature gate to true.
    Deploy your Kubernetes API Server and Kube Controller with parameter:
    -feature-gates=ExpandPersistentVolumes=true
  • Enable PersistentVolumeClaimResize admission plugin.
    Deploy your Kubernetes API Server with parameter:
    –admission-control=PersistentVolumeClaimResize

Once these features are enabled, you have to create a StorageClass with allowVolumeExpansion field set to true:

Once all below steps are done, a user can request a larger volume size for their PVC by simple editing the request PVC parameter and requesting a larger size. This will trigger the expansion of the PV associated to the PVC (Heketi will be called with a resized request).

And… Voilá! No downtime and your application can continue writing data without noticing.

Conclusion

As a conclusion, I must say that after some months administering a GlusterFS cluster, I’ve found myself with no problems. Replication works well as well as Expansion.

I work with data-intensive applications (such as ECM) where data growth is unexpected, so I needed a way to scale up my storage whenever I wanted, and Gluster meets my needs without compromising performance.

Reference

Do you want to know more about us?

Follow us

SOCIAL NETWORK

Share This

Share this post with your friends!