AWS Neptune Demystified: Your Guide to Graph Databases and Gremlin Queries

6 min readJun 25, 2024

The knowledge on graph databases is crucial as we live in a world driven by data. This can completely change the way businesses handle and study data related information. Let’s take a deep dive into the basics of graph databases, understand typical scenarios where they perform best, learn more about AWS Neptune and Gremlin as an effective query language for steering through graphs.

What is a Graph Database?

A graph database is a type of NoSQL database. It’s designed for data that has complex relationships and connections. Graph db primarily consists of nodes, edges and properties which in combine represents the data to be stored. Graph databases are primarily used to store complex relationships.

Common Areas Where Graph Databases Can Be Used

Graph databases are particularly useful in scenarios such as:

Social Networks: Modelling relationships between users and their connections.

Fraud Detection: Identifying suspicious patterns and connections between entities.

Network and IT Operations: Visualizing dependencies and optimizing network paths.

Knowledge Graphs: Organizing and querying interconnected information.

Knowledge graph of person and cities visited

What is AWS Neptune?

Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. The core of Neptune is a purpose-built, high-performance graph database engine.

Understanding Cluster and Instance in AWS Neptune DB

What is a Cluster in AWS Neptune?

A cluster in AWS Neptune is a collection of one or more database instances that operate together to manage a graph database. The primary components of a Neptune cluster include Primary instance and read replicas.

Key Features of Neptune Clusters:

High Availability: Neptune clusters are designed to be highly available, with automatic failover to a read replica if the primary instance fails.
Replication: Data is automatically replicated across multiple instances and Availability Zones to ensure durability and fault tolerance.
Scalability: You can add or remove read replicas based on your workload requirements, making it easy to scale your database read capacity.

What is an Instance in AWS Neptune?

An instance in AWS Neptune is a single, standalone database environment that provides the computational resources (CPU, memory, and network bandwidth) necessary to run your graph database. Instances are the building blocks of a Neptune cluster.

Types of Instances:

Primary Instance: Handles all write operations and data modifications. There is only one primary instance per Neptune cluster.
Read Replicas: Handle read operations and are used to distribute the read workload across multiple instances. You can have multiple read replicas in a Neptune cluster.

Relationship Between Clusters and Instances

Cluster: The overall structure that groups instances together to manage and operate a Neptune database. A cluster includes one primary instance and one or more read replicas.
Instance: Individual components within a cluster that provide the necessary resources for database operations. Each instance can either be a primary instance or a read replica.

Key Features of AWS Neptune

High Availability and Durability: Replication across multiple Availability Zones ensures data availability and reliability.
Scalability: Automatically scales to handle large volumes of data and query loads.
Security: Offers VPC-based network isolation, encryption both in transit and at rest, and access control via Amazon IAM integration.
Fully Managed: AWS handles maintenance tasks such as backups, software patching, and hardware provisioning.

Points to Consider While Creating an AWS Neptune DB

VPC Configuration: For improved security, make sure your Neptune cluster instances are housed inside a Virtual Private Cloud (VPC). To provide high availability and durability, configure a minimum of two subnets in separate Availability Zones (AZs).
DB Instance Types: Selecting instance types should be done in accordance with workload demands. While memory-optimized instances handle more intense tasks, general-purpose instances are appropriate for the majority of applications.
Security Groups: Set up security groups to manage incoming and outgoing traffic so that your Neptune instances are only accessible by those who are permitted.

DNS and Security: Enable DNS resolution and hostnames in your VPC, and ensure that your VPC has a DB subnet group containing the necessary subnets.

Querying in Neptune graph

Amazon Neptune provides robust support for multiple graph query languages, each tailored to different graph data modeling and querying needs. Here’s an overview of the query languages supported by Amazon Neptune:

Gremlin
openCypher
SPARQL

In this article we will see gremlin and its query patterns for some use cases.

Understanding Gremlin Traversal Terminologies

Gremlin, the graph traversal language defined by Apache TinkerPop, offers a powerful and flexible way to query and manipulate graph data. Here are some key terminologies used in Gremlin traversal that are essential to understand for effective graph querying:

1. Vertex

Vertices represent the entities or nodes in a graph. Each vertex can have properties associated with it.

2. Property

Properties are key-value pairs associated with vertices or edges. They store additional information about the graph elements. For example, in a social network graph, a vertex might represent a person with properties like name, age, etc,..

3. Edge

Edges represent the relationships or connections between vertices. Each edge can also have properties. For example, here an edge represents a knows relationship between two persons in a social network graph.

4. Label

Labels categorize vertices and edges. For vertices, labels represents the type of the entity person. For edges, labels describe the relationship called knows.

Breaking Down The Query

g.: is a reference to the traversal source. It is basically defined at the beginning of a Gremlin query and is used to invoke traversal steps and methods, guiding the traversal through the graph’s vertices, edges, and properties.
V(): Vertex step, starts the traversal with all vertices.
.has(label, value): Filter step, restricts the traversal to elements with the specified label and value.
.as('source'): This step is used to assign a label to a step or a collection of steps within a traversal. This labeling mechanism allows you to refer back to a previously labeled step later in the traversal, making it easier to construct complex queries.
.has('name', within('Bob', 'Eve', 'Dana')) step is used to filter vertices or edges based on a property value that matches the given set of values.
.addE(): This step is used to add edge between vertices. Here the edge knows will be created from Alice to Bob, Eve, Dana.

Use Case: Finding Friends of Alice

In this scenario, we want to find all friends of a user named Alice in a social network. Assume we have vertices labeled person and edges labeled knows representing friendships.

// output
Dana
Eve
Bob

Use Case: Counting Friends of Alice

In this scenario, we want to count the number of friends a user named Alice has.

// output
3

Similarly there are multiple query traversal techniques based on the use cases.

About The Author:

Sankarra Narayanan G who works at CodeStax.Ai has experience in no-code/low-code software platforms and currently delving into graph databases and its implementation.

About CodeStax.Ai

At CodeStax.Ai, we stand at the nexus of innovation and enterprise solutions, offering technology partnerships that empower businesses to drive efficiency, innovation, and growth, harnessing the transformative power of no-code platforms and advanced AI integrations.

But the real magic? It’s our tech tribe behind the scenes. If you’ve got a knack for innovation and a passion for redefining the norm, we’ve got the perfect tech playground for you. CodeStax.Ai offers more than a job — it’s a journey into the very heart of what’s next. Join us, and be part of the revolution that’s redefining the enterprise tech landscape.