Watch out for Maverick Glusters – Coming to a cloud near you

GlusterFS is a nice flexible clustered filesystem for Linux servers, and (through several methods) a few different client OS’s. It is included in Debian, and so, we have it available in the Ubuntu universe archive.

With Maverick (soon to be Ubuntu 10.10), we wanted to see how easy it would be to setup a gluster server and client on EC2, using nothing but our nifty cloud-init tool in Maverick. Once we got the mounts working right This proved pretty easy actually, as gluster has a really simple initial setup.

So, first I setup servers. Normally I think you’d want to use EBS for the servers, so that if you accidentally teardown all of your servers, your data is still reachable. But for this test, I simply used the instance store.

I fired up two instances of Maverick first. I used uec-run-instances from the ‘cloud-utils’ package as I like the –wait-for=ssh mode


uec-run-instances -p ec2 -n 2 -t m1.small --wait-for=ssh -l clint-fewbar ami-46f9132

The -l clint-fewbar above actually tells cloud-init to load my SSH keys that are attached to my launchpad account. Pretty neat trick, though it is, unfortunately, incompatible with feeding in more config data (though you can still run the command that will import these anyway).

This started two m1.small maverick instances (note that my environment is setup to run ec2-* already, which is beyond the scope of this document, but basically involves setting the environment variables EC2_CERT and EC2_PRIVATE_KEY). It doesn’t return until the SSH service is reachable. You can also add –verify-ssh, but that can take a while as it must wait for the console data to be updated, which can be a few minutes on ec2 (On eucalyptus, it is instant).

From here, I added the public IP’s to a local file called gluster_servers.txt, and used parallel-ssh to set them up:


PSSH="parallel-ssh -h gluster_servers.txt -l ubuntu -P"
$PSSH -- sudo apt-get install glusterfs-server
$PSSH -- sudo mkdir /mnt/test
$PSSH -- sudo glusterfs-volgen -r 1 -n test internal-ip1:/mnt/test internal-ip2:/mnt/test
$PSSH -- sudo mv -f test-tcp.vol /etc/glusterfs/glusterfs.vol
$PSSH-- sudo mv `hostname -s`-test-export.vol /etc/glusterfs/glusterfsd.vol
$PSSH -- sudo service glusterfs-server restart

Note that all of this could have actually been done by cloud-init too by feeding it a script.

Now to the clients, this was particularly cool. First, I took the example glusterfs file from /usr/share/doc/cloud-init/examples and modified it to substitute the internal hostname of one of my servers for ‘volfile-server-hostname’. Then I started up 3 instances feeding them all this file:


#cloud-config
# vim: syntax=yaml
# Mounts volfile exported by glusterfsd running on
# "volfile-server-hostname" onto the local mount point '/mnt/data'
#
# In reality, replace 'volfile-server-hostname' with one of your nodes
# running glusterfsd.
#
packages:
- glusterfs-client

mounts:
- [ 'internal-ip1:6996', /mnt/data, glusterfs, "defaults,nobootwait", "0", "2" ]

runcmd:
- [ modprobe, fuse ]
- [ mkdir, '-p', /mnt/data ]
- [ mount, '-a' ]

And to start three nodes with the above config data.


uec-run-instances -p ec2 -n 3 -t m1.small --wait-for=ssh --run-args='-k:myEc2KeyPairName:-f:cloud_config.txt' --run-args-delim=: ami-46f9132

As these were booting up, I started to see little messages in the logs…


[2010-08-19 20:41:10] N [server-protocol.c:6788:notify] server-tcp: 10.241.89.129:1023 disconnected
[2010-08-19 20:41:10] N [server-protocol.c:6788:notify] server-tcp: 10.241.89.129:1022 disconnected
[2010-08-19 20:41:10] N [server-protocol.c:5852:mop_setvolume] server-tcp: accepted client from 10.241.89.129:1023
[2010-08-19 20:41:10] N [server-protocol.c:5852:mop_setvolume] server-tcp: accepted client from 10.241.89.129:1021
[2010-08-19 20:41:25] N [server-protocol.c:6788:notify] server-tcp: 10.240.54.143:1023 disconnected
[2010-08-19 20:41:25] N [server-protocol.c:6788:notify] server-tcp: 10.240.54.143:1022 disconnected

Sweeeeet, this is the clients connecting to download the volume definitions.

Once this happened, I ran parallel-ssh with the client boxes to find out if they were mounted properly:


PSSH="parallel-ssh -h client_hosts.txt -l ubuntu -O "IdentityFile=myAmazonKeyFile.pem" -P"
$PSSH df

Lo and behold, they all had /mnt/data hosted and shared!

Coolness, now time to fire up some bonnie++…


$PSSH -t -1 -e stderr -o stdout -- bonnie++ -d /mnt/data/bonnie -f

You may notice that I had to pass ‘-f’. This is the “per-char” write test. This writes one char at a time, and on glusterfs was so slow, after 2 hours I aborted it.

While this was running, I was looking at the underlying exported data directory on the servers, and the files appeared to be staying in perfect sync, which makes sense given that we told gluster to use “raid 1″ redundancy.

And, upon finishing, I got this result:


Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
domU-12-31-39 3328M 6578 2 3881 1 37751 3 81.4 1
Latency 2013ms 667ms 254ms 436ms
Version 1.96 ------Sequential Create------ --------Random Create--------
domU-12-31-39-04-31 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 37 0 275 0 48 0 35 0 71 0 51 0
Latency 371ms 647ms 842ms 536ms 286ms 264ms
1.96,1.96,domU-12-31-39-04-31-61,1,1282263738,3328M,,,,6578,2,3881,1,,,37751,3,81.4,1,16,,,,,37,0,275,0,48,0,35,0,71,0,51,0,,2013ms,667ms,,254ms,436ms,371ms,647ms,842ms,536ms,286ms,264ms
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
domU-12-31-39 3328M 6568 2 3333 1 28002 2 116.0 2
Latency 2013ms 1709ms 178ms 437ms
Version 1.96 ------Sequential Create------ --------Random Create--------
domU-12-31-39-05-5A -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 33 0 170 0 43 0 34 0 133 0 59 0
Latency 481ms 761ms 679ms 306ms 87231us 129ms
1.96,1.96,domU-12-31-39-05-5A-77,1,1282263755,3328M,,,,6568,2,3333,1,,,28002,2,116.0,2,16,,,,,33,0,170,0,43,0,34,0,133,0,59,0,,2013ms,1709ms,,178ms,437ms,481ms,761ms,679ms,306ms,87231us,129ms
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
domU-12-31-39 3328M 2442 0 2324 1 31734 3 90.4 1
Latency 640ms 287ms 95367us 457ms
Version 1.96 ------Sequential Create------ --------Random Create--------
domU-12-31-39-04-71 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 45 0 5689 5 93 0 57 0 365 0 82 0
Latency 224ms 64123us 220ms 109ms 61270us 101ms
1.96,1.96,domU-12-31-39-04-71-67,1,1282262298,3328M,,,,2442,0,2324,1,,,31734,3,90.4,1,16,,,,,45,0,5689,5,93,0,57,0,365,0,82,0,,640ms,287ms,,95367us,457ms,224ms,64123us,220ms,109ms,61270us,101ms

Which basically says that over gluster, we were able to write at about 6.5MB/s, and read at about 37MB/s. To compare this to single m1.small I/O:


Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ip-10-196-243 3328M 36281 10 40132 11 112935 20 328.8 7
Latency 421ms 241ms 273ms 917ms
Version 1.96 ------Sequential Create------ --------Random Create--------
ip-10-196-243-47 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 18069 62 +++++ +++ +++++ +++ 22141 74 +++++ +++ +++++ +++
Latency 50194us 28054us 50611us 40347us 20038us 24108us
1.96,1.96,ip-10-196-243-47,1,1282346486,3328M,,,,36281,10,40132,11,,,112935,20,328.8,7,16,,,,,18069,62,+++++,+++,+++++,+++,22141,74,+++++,+++,+++++,+++,,421ms,241ms,,273ms,917ms,50194us,28054us,50611us,40347us,20038us,24108us

I re-ran this test a few times and it seemed to come out more or less the same, though I would guess it might come out differently on certain instances rather than others. Anyway, this shows 36MB/s writes, and 113MB/s reads. Seems like writing is about 6x slower over gluster, which makes sense as it must write over a network of unknown speed (gigabit? 100Mbit? congested?) and it must actually write to two servers, because of the “-r 1″ given to the glusterfs-volgen command, which means “RAID1″ or “replicate”. Latency also was about 1/4 of what it was over glusterfs, which again, is mostly owing to the network latency.

In conclusion, glusterfs is pretty easy to configure on Maverick cloud instances. I’m sure with some more knowledge and goal driven optimization, people will even tune it like crazy.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: