Thursday, June 05, 2014

@Swiftstack and @racktop #Openstack Swift Design Workshop

This is a second blog in the series from the Swift Design Workshop I am attending at Gannett in McLean. It is a One Day workshop and involves hands on setting up and working with a swift cluster.

After getting the Cluster up and running the next step is to start working with the cluster. You can use curl or the Python-swiftclient tool.

The commands are basically url calls.

swift -A http://{ip_address}/auth/v1.0 -U {user} -K {password} stat

Account: AUTH_user1
Containers: 2
Objects: 4
Bytes: 1333436
Meta Temp-Url-Key: 7df14d41-2ca0-4c86-a1ed-c13d44dbe3ac
X-Timestamp: 1401983347.97887
X-Trans-Id: tx16b6fd2c407749b98d69c-0053909dcf
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes

add a –debug flag to get more details of how it uses curl behind the scenes

This gives the following result:

DEBUG:swiftclient:REQ: curl -i -X GET

DEBUG:swiftclient:RESP STATUS: 200

DEBUG:swiftclient:REQ: curl -i -I -H “X-Auth-Token: AUTH_tk565ba6684a7c4023bbd2245fbd14ffc6″

DEBUG:swiftclient:RESP STATUS: 204

Account: AUTH_user1
Containers: 2
Objects: 4
Bytes: 1333436
Meta Temp-Url-Key: 7df14d41-2ca0-4c86-a1ed-c13d44dbe3ac
X-Timestamp: 1401983347.97887
X-Trans-Id: txcd96ba67fd74460eb07dd-0053909e71
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes

Adding the command “post {containerName}” will create a new container

list returns a list of files/containers

every object in swift has a url. But they are not world readable. So you need to apply permissions.

Swift can handle metadata. It is saved with the object.
use -m Tag: {tagname(s)} on the command line

If you want to change Metatags it looks like you need to read them in and then modify and then write back. I need to validate this functionality.


36 drives per chassis used to be the sweet spot. This is increasing. You need to match to total cluster size. You don’t want to lose a high percentage of drives in one go out of your cluster.

Local Attached beats Network Attached

48 or 60 Drive Nodes can get CPU bound before getting IO bound.

Get the disks as close to the OS as possible.

swift-get-nodes will display the details of where an object is stored.

swift considers an unmounted drive as something that can’t be written to. This avoids the mount point defaulting to the root drive and filling up available space.

Swift creates Handoff locations for objects. These will be used if a copy of an object is lost (eg. for a failed drive). Swift then automatically copies to one of the handoff locations.

Some admins run cron to grab logs on a daily basis and post to swift.
Logs tend to be chatty – easier for debugging.
Logs can also be channeled to Splunk for monitoring.

Disk Weights

Swift assigns weights based on Drive size.

2TB Disk = 2000 partitions
3TB = 3000 partitions

Ring Building

Use the Ring Builder to create rings.
Ring Distribution involves distributing the ring file. Confirm EVERY node has a copy and then activate.


Single Region Cluster Recommendation = 3

Number of partitions based on size of cluster
Rule of Thumb: 100 * biggest number of drives you think you will have.. Rounded up to nearest power of 2.

Min-Part-Hours : 24 hours.

You don’t want too many replicas in flight at the same time.
Changes are designed to take a little time to complete.

Adding Devices to the Ring. Define where and apply a weight based on disk size.

Data redistribution is done at the Partition level.

Meta Data

Meta Data is replicated across the cluster for High Availability
The Controller is a management console. The cluster operates without the controller console.

Meta data is limited. If you want to use the Metadata for indexing and searching there are limitations.
The common limits for swift meta data are:

swift.common.constraints.MAX_HEADER_SIZE = 8192Max size of any header
swift.common.constraints.MAX_META_COUNT = 90Max number of metadata items
swift.common.constraints.MAX_META_NAME_LENGTH = 128Max length of the name of a key for metadata
swift.common.constraints.MAX_META_OVERALL_SIZE = 4096Max overall size of metadata
swift.common.constraints.MAX_META_VALUE_LENGTH = 256Max length of the value of a key for metadata
Source: Racktop

An overview of how Racktop Systems have built a cloud storage solution using Swift and Swiftstack.

The combination of Swiftstack and Racktop gives flexibility to run on-site and in the cloud and manage across all locations seamlessly.

Starter configurations includes:

Up to 120TB of storage
2 x 1GB Ethernet for outbound networking
2 x 10GB EThernet for internal networking and load balancing.
2 Dedicated Proxy Servers
Storage pods
- 2U 12 Bay JBOD
- 4U 60 Bay JBOD

Enterprise Audit capabilities provide for data tracking through lifecycle from ingestion to ultimate destruction.


Working on enterprise key management. This will be a partnering activity.

My Racktop

Secure Cloud platform. Powered by VMware vCloud. Enabling Virtual Private Datacenters.
Multiple physical centers connected by high capacity dark fiber.

Failure Handling

You need capacity to handle failures:
Small Clusters should keep 20% spare capacity
Large Clusters 10% spare capacity.

Drive Failure is different from Node Failure.

Node Failures are often Transient. Usually detected by timeouts.

Drive Failures can be accommodated by a number of options. Swift will replicate as standard and move data around a failed drive.
You can turn a drive off.

Nodes can also be removed from a Cluster.


Networks can be physical or VLANs.

You will have an outward facing network and a cluster facing network

  • Outward Network – 10G
  • Cluster Network – 10G
  • Controller Network – 1G
  • Out-of-band Management – 1G

There is no caching inside Swift.
You can use Varnish or a CDN network in front of the Proxy Servers.

In smaller clusters put a proxy in each rack to create redundancy
Try to keep zones of roughly equivalent size.

Initially a Rack may equate to a zone but with growth there is a decision about whether each rack is a zone or whether to create group of racks as a zone.

Multi-site Global Clusters

Challenges: Reads and Writes can become slow when going across a slower WAN link.
WAN links are expensive and usually under-provisioned and over-utilized.

Going 2 4x replica allows East/West Coast Regions.

Proxy Read Affinity

  • Prioritize to use nearby zones.
  • Need capacity to handle the level of read requests.
  • Use DNS to route to closest regional center.

Affinity Write (off by default)

  • WAN Links can slow writes.
  • Writes locally to same region using zone allocation for resilience
  • Then perform back end replication that takes place slowly behind the scenes.
  • Need to have users understand the lag
  • Replication can take a long time
  • Separate Replication Network is an option. This can have a separate QoS.

Region Replication

  • Replication can be slow.
  • Quicker to install rack in existing region. Collect data. Shutdown. Ship to new center. Bring up. Connect and let the differential data replicate to catch up.

Performance Optimization

Use SSDs to store Container and Object Metadata.

Swift sets a 5GB limit (Compatible with S3). Larger files are chunked and uploaded. Then use command to pull all chunks in to one large object. Chunking improves performance because multi-threading means these objects can be written in parallel streams.


You can configure on the Proxy servers or directly to the Nodes themselves.


The Swiftstack authentication is a hash file that is retained in memory on the nodes.
It is suited for up to a 1,000 or so users.
LDAP is pluggable and also supports LDAP groups.
Active Directory is also popular.

Utilization Data

GB used.
API Requests
GB in/Out
+ Other metrics

This can be extracted at an account level.

File Managers for Swift:

  • Cloudberry Explorer
  • ExpanDrive
  • Cyberduck

File System Emulation:

Camaito – WebDav for Swift

What is next for Swift

  • Back-end Plug-in Architectures
  • Seagate Kinetic Ethernet drives. The drive speaks object instead of Block.
  • Storage Policies (across regions, media etc.)

This allows tuning based on usage.

  • Erasure Codes
    Allow lesser volumes of data due to replication.

via WordPress