DbSchema Database Designer

DbSchema | Cassandra - How to Create a Keyspace?

Publish on DbSchema Blog >>>

In this article, we will explore how to create a keyspace in Apache Cassandra using both __cqlsh (Cassandra Query Language Shell) and __DbSchema. We will also touch upon the concept of replication and define keyspaces.

Table of Contents

  1. Introduction to Apache Cassandra
  2. Prerequisites
  3. What is a Keyspace ?
  4. Replication Strategies
  5. Creating a Keyspace
  6. Conclusion
  7. References

Introduction to Apache Cassandra

Apache Cassandra is a highly scalable, distributed, and fault-tolerant NoSQL database designed to handle large amounts of data across many commodity servers. It was developed at Facebook and later released as an open-source project. It is especially suitable for applications that require high write and read throughput.

Cassandra provides a flexible data model based on columns, which are grouped into column families. This structure makes it easy to store and query structured, semi-structured, and unstructured data.

Prerequisites

Before proceeding, make sure you have the following prerequisites:

  1. Apache Cassandra installed and running. You can find the installation guide here.
  2. Basic understanding of Cassandra Query Language (CQL), keyspaces, and column families.
  3. Familiarity with command-line tools and SQL-like query languages.

Architecture of Keyspace alt >

What is a Keyspace ?

In Cassandra, a keyspace is a top-level namespace that groups related tables together. It is similar to a database in the SQL world, but with some differences. A keyspace acts as a container that holds tables and defines the replication strategy for data distribution. It provides logical separation and isolation of data within a Cassandra cluster.


Advantages of Using a Keyspace

Using a keyspace in Cassandra offers several advantages:

  • Logical Organization: Keyspaces provide a way to logically organize related tables. It helps in better structuring and management of data.
  • Replication Strategy: A keyspace allows specifying the replication strategy to ensure data availability and fault tolerance.
  • Flexibility: Keyspaces provide flexibility in terms of defining different replication settings and options for each keyspace.
  • Scalability: By distributing data across multiple nodes, keyspace enables linear scalability and the ability to handle massive amounts of data.

Limitations of Using a Keyspace

While keyspace offers various benefits, it also has some limitations:

  • Cross-Keyspace Joins: In Cassandra, joining tables across different keyspaces is not supported. If you need to perform joins, the tables must belong to the same keyspace.
  • Keyspace Level Operations: Some operations, such as dropping a keyspace or altering its replication settings, require careful consideration and planning, as they can have a significant impact on data availability and performance.

Replication Strategies

In Cassandra, replication strategies define how data is replicated across the cluster. Two commonly used replication strategies are

  • SimpleStrategy
  • NetworkTopologyStrategy

SimpleStrategy

SimpleStrategy is the basic replication strategy in Cassandra, suitable for a single data center deployment. It places replicas on nodes in a way that evenly distributes data across the cluster. With SimpleStrategy, you only need to specify the replication factor, which determines the number of replicas for each piece of data.

WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3};

Advantages of SimpleStrategy

Following are the advantages of using a SimpleStrategy replication strategy:

  • Simple to configure and use
  • Evenly distributes data across the cluster

Limitation of SimpleStrategy

Following is the limitation of using a SimpleStrategy replication strategy:

  • Not suitable for multi-data center deployments

NetworkTopologyStrategy

NetworkTopologyStrategy is a more advanced replication strategy suitable for multi-data center deployments. It allows you to define replication factors per data center. This strategy ensures that replicas are distributed across multiple data centers, providing fault tolerance and better data availability.

WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'DC1': 3, 'DC2': 2};

In this example, 3 replicas are created in datacenter ‘DC1’, and 2 replicas in ‘DC2’.

Advantages of NetworkTopologyStrategy

Following are the advantages of using a NetworkTopologyStrategy replication strategy:

  • Supports multi-data center deployments
  • Allows fine-grained control over replica placement by specifying replication factors
  • Provides fault tolerance and better data availability in multi-data center deployments

Limitations of NetworkTopologyStrategy

Following are the limitations of using a NetworkTopologyStrategy replication strategy:

  • Requires careful planning and configuration of data centers
  • Complexity increases with the number of data centers

Creating a Keyspace

A keyspace in Cassandra is similar to a database in traditional RDBMS. It is a container for column families (tables) and defines the replication strategy and options for its data.

Using cqlsh

To create a keyspace using __cqlsh`, follow these steps:

  1. Open the command prompt or terminal and start cqlsh by running the following command:
cqlsh
  1. Create a new keyspace with a suitable name, replication strategy, and replication factor. Here’s an example of creating a keyspace called __my_keyspace` with a replication factor of 3 and a simple replication strategy:
CREATE KEYSPACE my_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3};

Replace __my_keyspace` with the desired name, and adjust the replication factor as needed.

Using DbSchema

DbSchema is a visual database designer that supports multiple databases, including Apache Cassandra. To create a keyspace using DbSchema, follow these steps:

  1. Download and install DbSchema from the official website.
  2. Launch DbSchema and click “Connect” to open the “Connect to a Database” dialog.
  3. Select “Cassandra” as the database type and enter the required connection details, such as hostname, port, and credentials.
  4. Once connected, right-click on the “Keyspaces” node in the “Schema Tree” panel and choose “Create Keyspace.”
  5. Provide the keyspace name, replication strategy, and replication factor, and click “Create.”

Create Keyspace and Visually Manage Cassandra using DbSchema

DbSchema is a Cassandra client and visual designer. DbSchema has a free Community Edition, which can be downloaded here.

Create Keyspace

Start the application and connect to the Postgres database. Navigate to the Schema Tree panel and create a new keyspace.

Conclusion

Understanding how to create a keyspace in Apache Cassandra is crucial for managing and organizing data in the database. It not only helps in designing the data model but also determines how data is distributed and replicated across different nodes or data centers in the cluster. Both the methods, cqlsh and DbSchema, have their own advantages and can be chosen as per the convenience and requirements. Familiarizing yourself with the replication strategies further improves the robustness and reliability of the database system.

References

  1. Apache Cassandra Documentation
  2. DbSchema Cassandra Designer
  3. Apache Cassandra - Creating a Keyspace
  4. Cassandra Replication Strategies

Visually Manage Databases using DbSchema

DbSchema is a databases client and visual designer. DbSchema has a free Community Edition, which can be downloaded here.
DbSchema main features include:

DbSchema Designer alt >

Interactive Diagrams

Design tables, column and foreign keys directly in diagrams, by double-clicking them. Changes will be saved to the design model and, if DbSchema is connected to the database also into the database. More.


Connection Dialog alt >

Simple Connection Dialog

Choose the database location, the user and password, and simply get connected. Choose 'Edit Manually' into the JDBC URL combo to enter a custom URL. More.


Relational Data Explorer alt >

Relational Data Explorer

Explore data from multiple tables simultaneously, using foreign keys or virtual foreign keys. Double-click cells to edit the data. More.


Query Builder alt >

Query Builder

Create SQL Queries featuring JOINS, GROUP BY, ORDER BY just using the mouse. More.


SQL Query Editor alt >

SQL Query Editor

Edit and execute SQL Queries. The editor is autocompletion-enabled. More.


Schema Synchronization alt >

Design Schema in Team & Schema Deployment

DbSchema is using the design model, a copy of the schema structure, independent of the database.
The design model can be saved to file and shared in a team.
Connecting to another database you may compare the design model with the database, commit the differences or merge them in the design model. More.


Dark Theme alt >

Dark Theme

Configurable styles & dark theme. More.


Many features are available in the free Community edition.
The Pro edition adds capabilities to save the design to the model file, design schema in team and deploy the schema on multiple databases.


DbSchema can be downloaded for free. No registration is required.