Transforming Your Database

What is a database? Once upon a time, it was simple. The database was a modern Bob Cratchit putting data in tables made up of very straight columns filled with one row per entry. Long, endless rectangles of information stretching on into the future.

The relational database has been the bedrock of modern computing. The vast majority of websites are just a bunch of CSS (Cascading Style Sheets – elements displayed on screen) or lipstick painted on top of SQL (Structured Query Language – used to communicate with a database).

Everything that makes us special is just another row in the big table of life.

The love affair with the big matrix of bits is slowly fading as developers are realizing that not everything fits into a simple table. And because developers are smart and obsessive about finding solutions for every need, they’ve started creating new and better places to store the information. The last few years have brought an explosion in other mechanisms for squirreling away our data.

Are these wonderful new options still databases? Does the data have to fit into some big matrix to be a database? Some like to use the word “data store” to differentiate the modern mechanisms because the word “database” is too tightly linked in our minds to the old tabular structure. We’ll leave that up to the philosophers. Data goes in and answers come out.

Here are eight ways that the database is being reinvented in new shapes and forms.

GPU Computing

Once upon a time, video cards were built to draw elaborate scenes for kids’ games, but now the so-called graphics processing units are doing plenty of non-graphical processing. Searching through data is just one of the best non-graphical operations for them to tackle. And why not? Plowing through endless piles of data looking for a match is an inherently parallel operation made up of lots of rudimentary jobs (testing equality) repeated millions of times. So it is pretty simple to turn the job over to the thousands of processors in the GPU.

The biggest wins aren’t in answering each query (which are obviously many times faster) but in the preparation work, because there is little need for preprocessing. Many databases save time by maintaining an index, which is effectively a precomputed result of every possible search.

If this index is corrupted or destroyed, rebuilding it can take hours, days, or maybe even months. If the data can fit inside a GPU’s memory, though, you can usually get by without the index. If the data is changing quickly and most of the index is never used, then skipping the preprocessing can be quite effective.

Non-volatile memory (NVRAM)

Programmers who cut their teeth 50 years ago had it easy. They didn’t have to juggle data between the RAM and the disk with elaborate protocols for ensuring consistency. That’s because the memory back then was iron core and wasn’t erased when the power was turned off. Those good times may be back again soon because chip manufacturers are talking about replacing RAM with NVRAM or non-volatile memory.

This is a big game changer for database programmers because one of their biggest challenges (and even their greatest reason for living) is disappearing. Some suggest that the databases can get much faster because the transaction semantics can be simpler. Others float the idea of building the recovery log after the data is written to the media, not before.

No one knows how the dust will settle. Will people still use a database at all if they don’t need a permanent record? Or will the searching and indexing keep them coming back? All of the algorithms and all of the architectures are up for rethinking. We’ll know the best way to use NVRAM in a decade or so.

Scale-out SQL

When the NoSQL movement began, one of the big features was the ability to spread your data storage across multiple nodes. NoSQL databases like Cassandra and MongoDB made it seem like getting all of the nice features of large-scale storage meant abandoning the comfortable world of SQL.

In reality, there doesn’t need to be a tradeoff. While the earliest experiments in large-scale databases were easier to create because they left behind all of the SQL baggage, there’s no reason why SQL can’t work well across multiple machines running at huge scale. Indeed, companies like Oracle have been doing it for years.

The newest large-scale databases let you use all of your SQL knowledge and convenience with a set of data spread out across a big cluster. CockroachDB, for instance, offers a standard SQL query engine that accesses data replicated in multiple nodes, all with ACID guarantees. Yes, you’ll pay for some of this belt-and-suspender support for data consistency, but perhaps less than you expected.

If guaranteed consistency is important to your work, start by checking out stacks like CockroachDB, Google Cloud Spanner, Clustrix, Azure SQL, and NuoDB.

Geospatial Databases

Traditional databases are built for one-dimensional data sets, not the two dimensional coordinates from geography. You can fake it and use a standard database to accomplish basic tasks with geographic coordinates. If you stick latitude and longitude in separate columns, it’s not hard to search for rows that fall within a box defined by a range of latitudes and longitudes. But once you want to go beyond this basic box, standard SQL queries just don’t cut it.

Geospatial databases add a few extra functions that make searching, sorting, and intersecting much easier in two-dimensional space. Spatial indices, for instance, usually work by adding a grid on top of the coordinate space to make it much faster to search for rows that are adjacent in two-dimensional and three-dimensional worlds.

These indices make it possible to write queries with operations like “contain,” “overlap,” and even “touch” with sets that are defined by polygons. All of this makes reasoning about the real world that much more efficient.

Check out Neo4j Spatial, GeoMesa, MapD, and PostGIS for some good places to begin.

Graph Databases

Tables are a good repository for many data structures but they don't do a great job of modeling one big, emerging data structure that has powered the last 10 years of Internet evolution: the network. As the so-called “social graph” explodes, we’re filling our computer with more and more nodes with links between them.

And the connections between the nodes are often more important than the data in them. Sure, storing and retrieving one link between one pair of nodes is easy to do in a classic relational database, but more complicated queries start to get impossible. Is Bob two or three hops away from Chris in the friendship network? Is Mary dating the ex of one of her friends?

Graph databases make queries like this easier to run. There is no endless fetching from tables because the query knows how look in the neighborhood specified by the links. Tools like Neo4J, OrientDB, and DataStax are just a few of the options that now can barely be counted on two hands and two feet. They have their own query languages too.

Cloud Databases

One of the biggest changes lies in how we buy database software. In the past, we bought our own machines and signed licensing deals to run the software on our machines. Now the cloud companies are offering services that store blobs of data off somewhere that we can’t see or touch. They just say the data will be there when we want it.

The advantages are obvious. There is no need to maintain the server or the room holding it. There is no need to worry about licensing or configuration or installing patches. Someone else deals with all of those headaches. The solution is often cheaper too — especially if you don’t have a ton of data to store. The services usually charge by the byte.

But the dangers, if there are any, are lying in the shadows. Does someone else have access to the data? Is the server protected from power surges, lightning storms, or floods? Is the data backed up to a trustworthy offsite location? You’ve got to trust the cloud provider on everything.

Major cloud service providers Google, Microsoft, and Amazon offer a long list of database services. These days Oracle, MongoDB, and DataStax also make their databases available in the cloud.

Artificial Intelligence (AI)

Some say that artificial intelligence is just a term for the latest generation of research that is just rolling out of the labs and into production. If so, there are a number of new products and solutions adorned with buzzwords like “machine learning” or “neural networks” or “deep learning.” They may not seem like a database, but you fill them with data and ask them questions. Why not?

The good news from artificial intelligence solutions is that you don’t need to know what you’re looking for. You can just wave your hand and ask for something nebulous like “most interesting” or “closest.” There is no need for the right key, the infernal reference number that the customer service folks are always asking you to write down.

The bad news is that you won’t know if you’ve gotten the right answer because you didn’t specify the question with any precision. Is that blog post really the most interesting? The biggest secret for Google’s success is that there is no absolute right answer. If you’re in the ball park no one can complain.
 
The list of machine learning toolkits is almost too long to contemplate. You can always ask your favorite search engine for the “most interesting” AI.

Blockchain

The word blockchain may be tangled up with the complicated economics and politics of Bitcoin, but underneath all of the talk about currency is an extremely stable and practical distributed data store. Everyone has a chance to update the data in the long table and everyone gets to share in the answer. The big excitement is the fact that everyone shares in the same answers. It’s perfect for businesses that are frenemies.

Some developers take this a bit further and talk about “smart contracts,” which is another way of saying that the bits in the database are trustworthy enough for people to base legal issues like ownership upon them. You can’t do that with a regular database, which can be tweaked by anyone with administrative privileges.

There are weak points, though. Each user must maintain an encryption key because all transactions must be digitally signed. If that key gets lost or forgotten, the data in those rows is frozen forever. If that key gets stolen, well, all bets are off. The blockchain isn’t perfect, in other words, but it’s much more reliable than the standard model.

R3, Ripple, and IBM are just three of the many competitors exploring the space. Many of the leading banks have their own internal projects. And then there are the Bitcoin and Altcoin companies themselves, which are also big parts of the ecosystem.

Infoworld

You Might Also Read: 

Get Your Data Strategy On Board:

Measuring the Economic Value of Data:

« A 9-Step Guide For GDPR Compliance
Equifax Executives Resign Without Charge »

Infosecurity Europe
CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

MIRACL

MIRACL

MIRACL provides the world’s only single step Multi-Factor Authentication (MFA) which can replace passwords on 100% of mobiles, desktops or even Smart TVs.

IT Governance

IT Governance

IT Governance is a leading global provider of information security solutions. Download our free guide and find out how ISO 27001 can help protect your organisation's information.

CYRIN

CYRIN

CYRIN® Cyber Range. Real Tools, Real Attacks, Real Scenarios. See why leading educational institutions and companies in the U.S. have begun to adopt the CYRIN® system.

ZenGRC

ZenGRC

ZenGRC (formerly Reciprocity) is a leader in the GRC SaaS landscape, offering robust and intuitive products designed to make compliance straightforward and efficient.

NordLayer

NordLayer

NordLayer is an adaptive network access security solution for modern businesses — from the world’s most trusted cybersecurity brand, Nord Security. 

Prolinx

Prolinx

Prolinx provide secure Data Centre hosting services and other fully managed security services for networks and information systems.

Lakeside Software

Lakeside Software

Lakeside Software is how organizations with large, complex IT environments can finally get visibility across their entire digital estates and see how to do more with less.

Research Institute in Science of Cyber Security (RISCS)

Research Institute in Science of Cyber Security (RISCS)

RISCS is focused on giving organisations more evidence, to allow them to make better decisions, aiding to the development of cybersecurity as a science.

Italian Association of Critical Infrastructure Experts (AIIC)

Italian Association of Critical Infrastructure Experts (AIIC)

AIIC acts as a focal point in Italy for expertise on the protection of Critical Infrastructure including ICT networks and cybersecurity.

Detack

Detack

Detack is an independent supplier of IT security auditing and consulting services.

Compumatica

Compumatica

Compumatica is a leading European ICT security manufacturer for cybersecurity and encryption products. Solutions include network security, SCADA/ICS security, Mobile/BYOD and email encryption.

Digital Ship

Digital Ship

Digital Ship provides news, information, conferences and events focused on digital ship systems, information technology and security relating to maritime operations.

Applied Risk

Applied Risk

Applied Risk is an established leader in Industrial Control Systems security, focused on critical infrastructure security and combating security breaches that pose a significant threat.

BrainChip

BrainChip

BrainChip is the leading provider of neuromorphic computing solutions, a type of artificial intelligence that is inspired by the biology of the human neuron - spiking neural networks.

Private Internet Access

Private Internet Access

Private Internet Access is a Virtual Private Network services provider offering secure encrypted access to the internet.

ExpressVPN

ExpressVPN

ExpressVPN is a Virtual Private Network services provider offering secure encrypted access to the internet.

Dellfer

Dellfer

Dellfer secures connected cars and other IOT devices through Intrinsic protection, enabling the most sophisticated cybersecurity attacks to be seen instantly and remediated with precision.

Progress Partners

Progress Partners

Progress Partners is a corporate advisory firm that works with buyers and sellers of emerging growth companies to complete M&A or private placement transactions. Our sectors include cybersecurity.

SensCy

SensCy

SensCy is a Trusted Guide for Sensible Cybersecurity for small and medium-sized organizations.

GAM Tech

GAM Tech

GAM Tech is a Managed IT Service Provider that serves small and medium sized businesses in Alberta, British Columbia, Ontario and Quebec.

EasySec Solutions

EasySec Solutions

EasySec Solutions provides a cyber-security platform, based on a combination of the zero trust model and the software-defined security management.