Understanding Data Security: 5 Best Practices

Laboratories are capable of creating mounds of data, and they all face the challenge of securing that data.  The recent blog Introduction to Cloud Data Security covered several key considerations for labs relying on (or thinking about relying on) cloud-based data sharing.  This discussion will examine five important principles within the broader topic of information security.

Computer security is a process, a continuous evaluation of risks that result in concrete steps for safeguarding computer systems and the data they generate, wherever those systems may live.  However, whether housed locally behind a company firewall or remotely within Amazon’s data centers, principles and best practices can help to guide that process and yield steps that minimize the risks we face.

1. Apply the principle of least privilege

Cloud infrastructure is rented, which means the ultimate key to the security of any cloud-hosted computer system is the administration account that is set up with a provider. All account authentication information, including passwords, should be stored securely and shared among a very small set of administrators, if it is shared at all. This is the principle of least privilege: only the access required to perform a certain task should be granted, and no more.

  • Establish a process for requesting and provisioning additional resources to help keep the size of this group to one or two people.  Fewer authorized administrators means fewer potential gaps.  Cloud providers may offer tools to facilitate this kind of request pipeline.  For example, Amazon Web Services allows each account to have its own set of users, and to enforce role-based access to different services.
  • Role-based access is a tool through which permissions can be assigned, allowing certain actions to be performed by different groups of users.  Each member of the team, for example, could be granted permission to design their own computational set-up (a “machine image” in Amazon’s parlance), but only one person would be permitted to actually purchase new computational resources based upon those set-ups.

2. Use strong, unique passwords

Every user in the scenarios above will need his or her own username and password, and this is unlikely to be the only password they’ll need.  Databases and applications hosted in the cloud will likely also need to be set up with one or more username/password combinations.

  • Requiring and using strong, unique passwords for different services reduces the risk that an attacker who manages to steal one such password will be able to access other services. The difficulty, however, is that strong, unique passwords are difficult to remember.  A good password management system, such as LastPass, makes it much easier to generate and securely store such passwords for different services.
  • Some password managers (LastPass included) also provide mechanisms for secure password sharing, in which the recipient can access the service using the shared credentials but can’t access the credentials themselves.
  • Whenever possible, use multi-factor authentication to further safeguard account access. These services often combine a password with a one-time access code delivered to the user’s phone. Even if the user’s password is stolen, an attacker still cannot access the service because they cannot retrieve the one-time access code.

3. Minimize system exposure

Role-based access and strong passwords are excellent, well-understood tools, but we can go a step further:  not all services need to be exposed to the entire Internet.  In fact, some may not even need to be exposed to the Internet at all.

Let’s assume we have a three-tier software application:

  1. a client tier (the web browser)
  2. a business tier (the web server)
  3. a data tier (the database)

The client only needs one network port open in order to access the web server: port 443, the secure HTTP port.  This port can be opened in the cloud provider’s firewall while all of the other ports remain closed.  If other services, such as the Remote Desktop service on Windows, need to be used, those services don’t have to be exposed to the entire Internet – a whitelist can restrict network access.  The whitelists in a firewall are a list of network addresses that have permission to access a certain network port.  For the Remote Desktop service (port 3389, for the curious), only the company’s network address can be added to the whitelist.

This is another example of the principle of least privilege: instead of open, unfettered access to a system, only those services required are exposed, and only to those users (the company’s employees) who need them.

4. Don’t stop at the firewall

Even behind a cloud provider’s firewall, an application can be further secured. Firewall software is often already installed on operating systems, providing yet another layer of network security. Once again, by exposing only those network ports that are required, cloud systems are further isolated within a cloud service provider’s own network.

The web server will need to talk to a database in order to provide any useful information, but most databases have their own built-in “firewall.”  SQL Server, PostgreSQL, and MySQL, for example, can all be configured to listen only for connections from a specific network address.  If it’s possible for a web server and database to co-exist on the same computer (that is, if neither will require so many computational resources that they jeopardize the other’s performance), the database can even be configured to listen *only* to the machine on which it lives.

5. Automate auditing, but be sure to follow up

Once we have taken all of the steps we can to minimize system exposure, we must still keep an eye on those systems for unexpected behavior or suspicious activity.

  • Monitoring applications like Nagios or Zabbix only keep an eye on whether systems and services are meeting specific conditions, such as using no more than a suspiciously-large amount of computer memory.  They can’t, however, help us look for patterns that might lead to that abnormal amount of computer memory being used.
  • Software systems such as the commercial Splunk or open-source Fluentd allow us to gather up log file information and implement specific audits. If a suspicious-looking access attempt or series of attempts is made, these systems can be configured to generate notifications or warnings.

In the context of a robust security process, it’s important that auditing system notifications trigger a well-known sequence of follow-up steps.  In a major data breach at a well-known retailer, occurring within the retailer’s own data center, a notification of suspicious activity was sent, and then … nothing happened.  The greatest safeguards in the world won’t do any good without a solid process for following up.


Additional Resources


About the Expert

Daniel-GoldmanDaniel Goldman

Co-founder and Principal at StackWave and Co-founder of LabGauge, Daniel Goldman has over a decade of experience building software, primarily focused on database and server design and development.   Mr. Goldman is particularly interested in software systems that can be rapidly adjusted to facilitate the changes in workflow and data models so essential to keeping systems current with R&D processes.  In his words, “software should make it easier to do great science.”