Data Modeling:
MATCH (:Reader {name:'Alice'})-[:LIKES]->(:Book {title:'Dune'})
<-[:LIKES]-(:Reader)-[:LIKES]->(books:Book)
RETURN books.title
Nodes for Things, Relationships for Structure:
Though not applicable in every situation, these general guidelines will help us choose
when to use nodes, and when to use relationships:
Fine-Grained versus Generic Relationships:
It’s the difference between using DELIVERY_ADDRESS and HOME_ADDRESS versus
ADDRESS {type:'delivery'} and ADDRESS {type:'home'} .
MATCH (user:User {id:{userId}})
MATCH (user)-[:DELIVERY_ADDRESS]->(address:Address)
RETURN address
Later on, when adding some billing functionality, we introduce a BILLING_ADDRESS relationship. Later still, we add the ability for customers to manage all their addresses. This last feature requires us to find all addresses—whether delivery, billing, or some other address. To facilitate this, we introduce a general ADDRESS relationship:
By this time, our data model looks something like the one shown in Figure 4-8. DELIVERY_ADDRESS specializes the data on behalf of the application’s fulfillment needs; BILLING_ADDRESS specializes the data on behalf of the application’s billing needs; and
ADDRESS specializes the data on behalf of the application’s customer management needs.
The entities and relationships that we’ve surfaced in analyzing the user story quickly translate into a simple data model, as shown in Figure 4-1. Figure 4-1. Data model for the book reviews user story Because this data model directly encodes the question presented by the user story, it lends itself to being queried in a way that similarly reflects the structure of the ques‐ tion we want to ask of the data, since Alice likes Dune, find books that others who like Dune have enjoyed:
MATCH (:Reader {name:'Alice'})-[:LIKES]->(:Book {title:'Dune'})
<-[:LIKES]-(:Reader)-[:LIKES]->(books:Book)
RETURN books.title
Nodes for Things, Relationships for Structure:
Though not applicable in every situation, these general guidelines will help us choose
when to use nodes, and when to use relationships:
- Use nodes to represent entities—that is, the things in our domain that are of interest to us, and which can be labeled and grouped.
- Use relationships both to express the connections between entities and to estab‐lish semantic context for each entity, thereby structuring the domain.
- Use relationship direction to further clarify relationship semantics. Many rela‐tionships are asymmetrical, which is why relationships in a property graph arealways directed. For bidirectional relationships, we should make our queries ignore direction, rather than using two relationships.
- Use node properties to represent entity attributes, plus any necessary entity meta‐data, such as timestamps, version numbers, etc.
- Use relationship properties to express the strength, weight, or quality of a rela‐tionship, plus any necessary relationship metadata, such as timestamps, version numbers, etc.
Fine-Grained versus Generic Relationships:
It’s the difference between using DELIVERY_ADDRESS and HOME_ADDRESS versus
ADDRESS {type:'delivery'} and ADDRESS {type:'home'} .
Addresses are a good example. Follow‐ing the closed-set principle, we might choose to create HOME_ADDRESS , WORK_ADDRESS , and DELIVERY_ADDRESS relationships. This allows us to follow specific kinds of address relationships ( DELIVERY_ADDRESS , for example) while ignoring all the rest. But what do we do if we want to find all addresses for a user? There are a couple of options here. First, we can encode knowledge of all the different relationship types in our queries: e.g., MATCH (user)- [:HOME_ADDRESS|WORK_ADDRESS| DELIVERY_ADDRESS]->(address) . This, however, quickly becomes unwieldy when there are lots of different kinds of relationships. Alternatively, we can add a more generic ADDRESS relationship to our model, in addition to the fine-grained relation‐ ships. Every node representing an address is then connected to a user using two rela‐ tionships: a fined-grained relationship (e.g., DELIVERY_ADDRESS ) and the more generic ADDRESS {type:'delivery'} relationship.
Iterative and Incremental Development:
Graph databases provide for the smooth evolution of our data model. Migrations and denormalization are rarely an issue. New facts and new compositions become new nodes and relationships, while optimizing for performance-critical access patterns typically involves introducing a direct relationship between two nodes that would otherwise be connected only by way of intermediarie.
We will quickly see how different relationships can sit side-by-side with one another, catering to different needs without distorting the model in favor of any one particular need. Addresses help illustrate the point here. Imagine, for example, that we are developing a retail application. While developing a fulfillment story, we add the abil‐ity to dispatch a parcel to a customer’s delivery address, which we find using the fol‐lowing query:
MATCH (user:User {id:{userId}})
MATCH (user)-[:DELIVERY_ADDRESS]->(address:Address)
RETURN address
Later on, when adding some billing functionality, we introduce a BILLING_ADDRESS relationship. Later still, we add the ability for customers to manage all their addresses. This last feature requires us to find all addresses—whether delivery, billing, or some other address. To facilitate this, we introduce a general ADDRESS relationship:
MATCH (user:User {id:{userId}})
MATCH (user)-[:ADDRESS]->(address:Address)
RETURN address
MATCH (user)-[:ADDRESS]->(address:Address)
RETURN address
By this time, our data model looks something like the one shown in Figure 4-8. DELIVERY_ADDRESS specializes the data on behalf of the application’s fulfillment needs; BILLING_ADDRESS specializes the data on behalf of the application’s billing needs; and
ADDRESS specializes the data on behalf of the application’s customer management needs.
Just because we can add new relationships to meet new application goals, doesn’t mean we always have to do this. We’ll invariably identify opportunities for refactoring the model as we go. There’ll be plenty of times, for example, where an existing rela‐tionship will suffice for a new query, or where renaming an existing relationship will allow it to be used for two different needs. When these opportunities arise, we should take them.


