Tuesday, August 14, 2012

Projects in the Cloud in a cloudless auditorium of a communications company.

I am in the Annapolis Junction area of Maryland, near the NSA, and in a classic situation that is reminiscent of attending many health care conferences. I am in the auditorium of a communications company. There is no Wi-Fi available and cell and 4G coverage inside the building is bad.  

This is a lunch meeting of the Baltimore Chapter of the Project Management Institute: PMIBaltimore.org.

The topic is Project Management on Cloud-based projects. It will be interesting to see how projects are seen to differ when dealing with the cloud. My guess is that the major change aspect is probably the reduction in the amount of physical equipment needing to be provisioned and an increase in vendor management since services are more likely to be virtual.

There is a lot happening in the cloud arena. everything is getting virtualized. 

"Cloud Computing - What a Project Manager Needs to Know" - Dr. Patrick Allen - Johns Hopkins University

Cloud - A different mental paradigm.

Two types of Cloud: 

and we don't mean:
- Public 
- Private

1. Computing as a service (Utility Cloud and Virtual Machines - VMs)

Infrastructure as a Service (IaaS)
Platforms as a Service (PaaS)
Software as a Service (SaaS)

2. Data-focused cloud (runs on Virtual Machines)

Data Storage Cloud - eg. Hadoop cloud (Amazon S3)

Be clear about what type of cloud you are talking about. Essential to avoid conclusion.

Amazon is the best example of a Compute Service. AWS - EC2. Rackspace OpenStack.
Pay for use. but you can pre-pay for a better price.

Compute clouds are great for surge-type activities. spin up machines to compute then power down.

Side Channel attacks - When the hardware in the cloud is infected and can break in to the VM running on the infected machine.
For a side-channel attack the perpetrator has to break out of the VM and infect the base hardware. 
Amazon claim to have solved the side channel attack problem.

Questions to ask:
1. Cost of data stored
2. Cost for VMs used
3. How secure or private is the data
4. Security and privacy guarantees
5. Will PII (personally identifiable information) be protected
6. Can you use cloud for Continuity of Operations Plans
7. Can you store classified data in the cloud (maybe if properly secured, government accredited and private)
8. Are you planning to use a third party service. (you can build your own cloud)
9. Where is your computing (is it US Only - export controls)
10. Get a cyber-security risk assessment. 

Also (not mentioned) - Cost for bandwidth used and data transfers. Also ask about what counts as a Data Transfer. For example AWS won't charge for some data moved around within a region but will charge for transfers in and out of a region.

Data-focused Clouds:
Huge data - Petabytes or larger.
HDFS - Hadoop File System. Not-relational
Relational Databases: Think rows and columns
Accumulo and HBase present structured relational databases delivered via HDFS.

Hadoop is more popular than sector file management system.

Enables automatic parallelization of queries. 
Handles storing, location, replication and processing of your data.

Enables easier identification of patterns across the data. Using Map-Reduce.

Map-Reduce - Good for:

Huge data sets
Disparate datasources
unsure about what data queries will be done.
Summarizing independent processes.

Map-Reduce not good for:

- Relational databases are adequate for the task
- Well defined queries.

What drives you to use the cloud capabilities:

- Unstructured data in large quantities
- Surge events are common

Learn what is possible with Map-Reduce and the cloud capabilities.

Why build your own cloud?
- Security, privacy or proprietary needs not met by existing cloud services.
- Ongoing maintenance costs.

Interesting suggestion (not verbalized) that you want a message queue to handle, arbitrate and allocate processing requests.
Message queuing can be used to scale resources up and down.

General Cloud Questions:
- Will you know where your cloud is physically located.
- Amazon AWS uses regions but are not more specific.

- Who gets access (US or Foreign Nationals)
- How is data ingested?
- How is data stored?
- How is data secured at rest?
- How is data deleted?
- Can you test your data has been deleted or hidden?
- Can you keep your data separated from others?
- Frequency of Hardware upgrades
- Proven VMs?
- Who has done a risk assessment?


- Cloud is here to stay
- More projects will involve the cloud
- Need to understand strengths and weaknesses of the cloud
- Learn about cloud capabilities

FedRAMP is a new standardized approach to security assessment, authorization and security monitoring for cloud-prodcuts and services.

Good for low risk government projects. Not for High security applications.

Map-Reduce programs are more complicated if the data is not clean. 
Look at cleaning data on load to simplify queries.

Licensing in the Cloud. Manufacturers (software) have not really setup licenses for cloud services.
Best advice is to consider Open Source  and pay for support.

Security Issues - VMs may be more secure than Shared Hosting. In Shared Hosting the Feds can confiscate a server in a hosting facility. If the server is a shared host then other customers can be impacted.

Service Level Guarantees - What is reasonable? It depends. Worth looking at Service Level Agreements.