Experiences of Using Cassandra for Molecular Dynamics Simulations

In response to the requirements of applications that work with large amounts of data, various NoSQL databases have appeared to deal specifically with these challenges. These systems have become popular in environments such as data analytics and OLTP, however these are not the only data-intensive applications that can benefit from these databases. In the life sciences domain, there are many applications that still use flat files as a medium to store data, and they see themselves very limited in terms of scalability and performance, as well as code complexity.

We present an analysis on the viability of using these databases for applications with data demands that differ in some of the characteristics from what these systems were originally designed for. By using these databases, we can also observe that the design of the data model, queries and other configuration parameters can have a considerable impact on performance, thus we present examples of different data and system configurations to analyse their effects on performance. With the executions that are presented in this paper we can see performance gaps of a factor of up to almost 5 between using different models, queries and configuration parameters.

You might also like