Nowadays, applications cope with a lot of data: files, databases, etc. Since Big Data is now an important issue in all enterprise, it’s important to persist all data and be sure it’s safe.
Distributed file systems bring us the opportunity to keep our data safe and always accessible, distributed among different servers.
So, distributed storage seems like a good idea, but… is this technology already implemented? Can I afford it?
The answer is yes! Its name is GlusterFS. Let’s talk about it.
What is GlusterFS?
In its official website (www.gluster.org) we can find a definition: “GlusterFS is a scalable network filesystem suitable for data-intensive tasks such as cloud storage and media streaming”. Let’s say it is scalable software-defined distributed storage.
GlusterFS enables adding resources (devices) from multiple servers in a global namespace. It’s configured as a userspace filesystem. In order to interact with kernel VFS (Virtual File System), Gluster uses FUSE (File System in Userspace), a kernel module that supports interaction between kernel VFS and non-privileged user applications.
Many distributed storage solutions use metadata collections to perform searches in files. This solution implies a bottleneck in performance. To solve this problem, GlusterFS doesn’t have a central metadata server, so it uses Elastic Hash Algorithm to hash locations based on path and file name. All storage servers have the knowledge to find any data without searching for an index.
And… oops! Don’t forget the best part! GlusterFS is totally open-source, and it can be deployed in multiple environments (from cloud to on-premise servers).
But… why is better to use GlusterFS instead of traditional systems as NFS?
Below we’ll list the main advantages of using GlusterFS:
- Horizontal scalability (petabytes).
- Multiple clients.
- POSIX compatible.
- Replication, geo-replication, quotas…
- Open Source
- Flexibility (GlusterFS can be deployed in almost all environments).
- Linearly scalable performance.
- No need for central metadata server ⇒ better performance.
- Failover recovery.
GlusterFS is not a filesystem by itself. It concatenates existing filesystems in one, so data can be read and written in Gluster and distributed to multiple hosts simultaneously.
GlusterFS use some terminology we have to know before diving deeper:
- Trusted storage pool: hosts in a Gluster cluster.
- Node/Server: server belonging to a trusted storage pool.
- Brick: all devices (file systems) used for storage in Gluster
- Gluster volume: a collection of one or more bricks.
Volume types in GlusterFS
Most of Gluster operations happens in volume level. Depending on your resources and user needs, you should choose one volume type or other.
- Default type.
- Files distributed across various bricks in the volume.
- Efficient for write.
- Cheaper to increase capacity.
- No data redundancy ⇒ brick failure = data loss.
- Data replicated in all bricks.
- The number of replicas configurable (1 replica per brick).
- Efficient to read.
- Redundancy (if one brick fails ⇒ data accessed from the replicated brick).
- More storage for redundancy.
- Volume files distributed in replicasets of bricks.
- High availability.
- Easy to scale storage.
- More storage.
- Use case: big files and many access.
- Data is stored in the bricks but previously divided in stripes.
- Load balancing
- No redundancy ⇒ data loss.
- Striped but stripes distributed across more number of bricks.
- More storage.
- Based on erasure coding (data broken in fragments, expanded and encoded with redundant data pieces and stored in different locations).
- We need to define redundancy count – number of bricks that can fail without losing data.
- Each brick stores portions of data + parity or redundancy.
- Less storage than replicated.
- Recovery of data in case of failure.
So… you’ve decided to use GlusterFS but you’re a little bit worried about day-to-day management. Questions like: how many spaces is left? I need more space, how can I add an extra disk?
Don’t worry at all, Heketi will solve all your problems. Heketi provides a RESTFUL management interface which can be used to manage the lifecycle of GlusterFS volumes. Also, you can manage your cluster, performing actions like adding and removing new devices, change the topology of your cluster, etc. More details about Heketi operations will be covered later in this article.
GlusterFS offers lots of advantages against traditional storages solutions as NFS. Also, it can be deployed almost anywhere, so it’s the perfect solution if we have a huge amount of data to deal.
But… is it compatible with microservices? Can we use GlusterFS if our applications are deployed in Kubernetes? We will find the answers to all that questions in the second part of our article.
- GlusterFS official documentation: https://docs.gluster.org/en/latest/
- Gluster Docs: https://github.com/gluster/glusterdocs
- Managing volumes using Heketi: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/ch06s02
Álvaro Sánchez – DevOps Engineer at Intelygenz
Do you want to know more about us?