What PLM vendors need to know about noSQL databases?

December 14, 2012

Relational databases is a very mature set of technologies. We use RDBM (Relational databases) practically everywhere these days. It is hard to imagine enterprise software and PDM/PLM systems these days without relational databases. At the same time, the new class of database management solution is coming. It called NoSQL (Not Only SQL). I posted about noSQL few times. You can refresh your memory by navigating to the following link. First time this term came in use back in 1998 as "noREL" databases. Later in 2009, the term noSQL was proposed for "to label the emergence of a growing number of non-relational, distributed data stores that often did not attempt to provide atomicity, consistency, isolation and durability guarantees that are key attributes of classic relational database systems". NoSQL database solutions are widely used today in web and mobile applications. I can see a growing number of noSQL database usage in business intelligence and master data management applications.

NoSQL is not a single database. This is a name for a broad set of data management or database technologies focusing outside of RDBMS world. The technologies and terminologies behind this term is new. PDM/PLM vendors ignored noSQL database management solutions until very recently. It made me think to provide a quick summary of what stands behind this broad term and what PDM/PLM uses cases it can support.

Key-value (KV) databases

KV stores is a simplest database model in noSQL world. It stores "keys" and associated "value". Basically your database is a storage of pairs of key-value. Some databases support more complex structure behind values such as complex values (list, hash), but it is not required. One of interesting PDM/PLM use cases is to store list of files as a key-value database. In such a case, file name is a key (including full path) and value is actually the content of the file. Examples of KV stores are Riak and Redis.

Colum-oriented databases

This type of database is very close to RDBMS. The main difference is that columnar data model designed to keep data from every column in the table together. It is an opposite solution to RDBMS, which keeps the data for a specific row together. It allows to add a column to a table in a very "inexpensive" way. Each row may have a different set of columns. This type of databases are good for reporting and business intelligence solutions. Columnar data model impacted few PDM/PLM core modeler development available today at the market, by providing a higher level of flexibility in data modeling. Example of column-oriented databases is HBase.

Document-oriented database

Document databases are managing data in a form of documents. Documents can be different and have different structure. The last thing makes document oriented databases very flexible. Some implementations of document oriented databases such as MongoDB provides you an ability to run query against the document structures as well as do mapreduce computations as well. Depends on the need you can consider different DO-databases. Examples of these databases are – MongoDB and CouchDB. You can consider document database in PDM/PLM in two cases – the need for high-performance scalable document store and free form data modeling.

Graph-databases and triple stores

Graph data model is dealing with highly interconnected data. It contains nodes and relationships between nodes. Both nodes and relationships can have properties (key-value pairs). This data model becomes really important when you are traversing through the nodes with a specific relationships. There are many situations in PDM/PLM applications when we need to traverse data efficiently. Graph database (and predecessors – object databases) has a great potential to bring a value here. The example of graph databases is Neo4j. Also, a specific case of graph databases is so-called triplestores managing information using triples (subject-predicate-object). Examples of triple stores are OWLIM and AllegroGraph. Also triple stores are supported by Oracle and IBM DB2

CAP Theorem and why PLM systems need to use more than one database?

In computer science CAP theorem states that it is impossible for a distributed computer system to simultaneously provide all there guarantee Consistency (all nodes see the same data at the same time), Availability (a guarantee that every request receives a response about whether it was successful or failed) and Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system). Navigate here to read more. It is a question of priorities and a tradeoff between what requirements you need to satisfy in your system. PLM systems are facing significant challenges in a variety of data types, retrieve patterns and data scaling. Usage of different strategies in database management can improve existing solutions.

What is my conclusion? PLM is a multidisciplinary approach. It handles variety of data and connected to many places in the organization. Design, engineering, manufacturing, supply chain, support, services. The specialty of PLM environment is to get connected to all data suppliers and interplay with different sources of data. From that standpoint, data behaves like oil – located in multiple places, but needs to be extracted. You need to use different tools to get it out. Think about different database as a tool-set to process and get access to data in a most efficient way. Just my thoughts…

Best, Oleg


Follow

Get every new post delivered to your Inbox.

Join 250 other followers