This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

Concepts that will help flow manager customers get the most out of their Attack Flow databases.

1: Knowledge Graphs

2: ATTACK FLOW

3: Flow Builder V1

4: SPARQL
5: SPARQL CREATE
6: SPARQL UPDATE

To use and get the most out of Flow Manager, it’s helpful to understand a few key technologies.

Knowledge Graphs - Provides a brief introduction to graphs and the specific type used by Flow Builder
Attack Flows - Provides basic definitions of attack flows and their conceptual use.
Flow Builder - Provides a detailed walk through for craeting flows in the FLow Builder V1 webapp produced by the Mitre Engiunity Center for Threat Informed Defense (CTID).
SPARQL - SPARQL is a query language for graphs. Flow Builder uses SPARQL as the primary way to retrieve flow data from the graph database. This tutorial provides several useful query examples covering key concepts.
SPARQL Create SPARQL can also be used to create flows. This quick example will show you how.

1 - Knowledge Graphs

Get some basic information about what knowledge graphs are and how to use them.

Attack Flow and Flow Manager are designed to work with knowledge graphs. Which begs the question, “What’s a knowledge graph?” And possibly, “What’s a graph?”

Graphs are pretty simple. They are made of two things: nodes (dots) and edges (lines). Edges always start and end in a node. That’s about it for rules for graphs.

Knowledge graphs are a specific type of graph. It’s mostly documented in the Resource Description Framework (RDF) spec at W3, but lets not start there. Instead there’s three important things to know. In knowledge graphs …

all nodes and edges are IRIs (except the nodes that are Literals)
everything is a triple
namespaces are used a lot to organize and simplify triples

Lets go into each of these in a bit of detail.

Internationalized Resource Identifier (IRI)

While there’s more to it, IRIs are basically web addresses, so https://some.namespace.org#the_node_or_edge. They can also be things like urn:absolute:namespace#the_node_or_edge. The main point is they have two parts, the namespace (https://some.namespace.org or urn:absolute:namespace in our examples) and a node/edge. (Ok, so it’s more complicated than that. But by the time you need to know the rest of the stuff, you’ll be comfortable with it and you won’t be looking for it here.)

The exception that proves the rule: Sometimes triples (see below) point to literals. These can be a string, a number, really any XSD (see namespaces below) data type. These are like the value of a property of knowledge graph nodes (since the note below about properties).

Everything is a triple

A triple is 3 things (surprising! I know!):

urn:absolute:flows#Action1 rdf:type https://attackflow.space/veris#Phishing
Subject Predicate Object
Source Relationship Destination
Beginning Middle End

There are a few important implications of this:

Knowledge graphs are directed. urn:absolute:flows#Action1 rdf:type https://attackflow.space/veris#Phishing is not the same as https://attackflow.space/veris#Phishing rdf:type urn:absolute:flows#Action1.
Nothing has properties. There are other graphs where a node or even an edge might have a dictionary of properties, but not knowledge graphs. That said, it’s easy to represent properties. Instead of something like Node1 {‘a’: 1, ‘b’: 2}, you just create triples: Node1 - a - 1, Node1 - b - 2.

Namespaces

Namespaces are a critical part of knowledge graphs. They help group like data and explain who is doing the defining. You can find the V1 knowledge graph version of the attack flow schema at attackflow.space. For example, in the previous section, we mentioned “veris.attackflow.space”. This is a namespace to represent VERIS as a knowledge graph. An organization might store all of it’s flows as “urn:absolute:flows”.

Additionally, namespaces let you use other folks work rather than reinventing the wheel. There are a huge number of namespaces structuring all types of data about the world. You can read about some of the most popular namespaces at http://richard.cyganiak.de/blog/2011/02/top-100-most-popular-rdf-namespace-prefixes/ but one thing to keep in mind is most of the time a namespace defines a TON of stuff when you probably only need one or two things from it. The following namespaces are useful or used in attack flow, but usually only for one or two things:

RDF (Resource Description Framework) - type (defines what type a node is)
RDFS (Resource Description Framework Schema) - label (the human readable ’name’ of a node or edge)
OWL (Web Ontology Language) - SameAs (two nodes are the same thing), ObjectProperty (a Predicate that needs to point to another node), DatatypeProperty (a Predicate that needs to point to a Literal), NamedIndividual (a node is an instance of a thing, kinda like an object in a program is an instance of a class, or you’re an instance of a person)
DC (Dublin Core) - description (helpful for creating a description of something)
TIME - timestamps
XSD (XML Schema) - data types

It’s actually kinda important that you don’t go looking too much into these different schemas until you’re ready. They can contain LOTS of information that won’t really help you and becomes really confusing really quickly. Take OWL for example:

A note about OWL

Frankly, OWL (Web Ontology Language) may be the biggest hindrance to using knowledge graphs. When people search for RDF or knowledge graphs, OWL often comes up. OWL provides some useful things (see the list above), but it also provides a LOT of other stuff. Specifically, OWL provides the ability to reason over the knowledge graph. This means things like: “if animal:dog rdf:type animal:mammal and family:Sadie owl:NamedIndividual animal:dog, then family:sadie rdf:type animal:mammal”. It’s kinda cool, but the problem is it gets really complicated really quick. We don’t need that. (And when we do we’ll just hit the ‘reason’ button in our software.) Because of the confusion, skip the OWL documentation until much later. Just use it for the things (object types, named individuals, etc) that it’s useful for now.

2 - ATTACK FLOW

The basics of what Attack Flow is and how it can be used.

Attack Flow is a standard for graphs in information security. An attack flow is a machine-readable representation of a sequence of actions and assets, plus knowledge properties, joined by relationships.

Attack Flow has 5 parts: Actions, Assets, Properties (Objects/Data) & Relationships, all joined through a Flow.

Actions are things that happen
Assets are things that have state changes
Properties are contextual triples (X -[Property of]-> Y) describing either other ’things’ or data like a hash
Relationships create the causal and contextual connections
Flows are the set of Actions, Assets, Properties, and Relationships

When we combine graphs and the 5 parts, we get a flow.

Attack Flow provides the 3 C’s: Causality, Context, and Complexity

Causality is the causal path from action to asset and asset to action. It’s what caused what
Context are the properties (both literal properties like description strings and object properties like a ‘person’ (with a name, job title, permissions, etc)) that describe other nodes
Complexity is the ability to represent complex relationships

You’ll notice the causal path goes action->asset and asset->action. This is intentional. It forces good data modeling, helps represent complex relationships, and makes visualization easier.

This helps solve several data challenges in information security:

Lots of text -> Broken up into clear parts (nodes) and relationships (edges)
Structured data has steep learning curve -> Only learn what you need (node-centric search and namespaces for specialized information)
No focus on relationships -> Relationships are first-class
Little causality -> Explicit structure for causality
No context -> Knowledge Graph

3 - Flow Builder V1

How to create a flow in json-schema format using flow builder version 1 that can then be uploaded to Flow Manager.

Attack Flow Processes – Basics

Introduction

Attack Flow is useful in a wide variety of use cases. Many of them have their own unique uses, however many of them share commonalities. This document will provide processes for Attack Flow that are useful in all other use cases.

TL:DR

Ok, you opened this documenting wanting to get started. But then you saw that it’s really long. Start here and reference the rest as you need. You should have an idea about what you want to describe. We’ll. Use the attack flow designer from https://github.com/center-for-threat-informed-defense/attack-flow, but note it isn’t geared towards graphs. Additional documentation will explain how to create them with SPARQL (a graph query language that makes it very easy) and the forthcoming Flow Manager:

Right click and create an action, filling in the fields, (mainly name, description, timestamp, reference, and “AND” for logic_operator if you weren’t sure what to put there). Actions are something that changes an Asset. If you’re unsure what to put in the properties, skip down to the discussion of action properties below

Right click and create an asset. Add any properties such as a reference and name. (by right clicking and creating ObjectProperties)

Create edges between the action and asset. Label it with “compromised” since that’s what happened. (Other options are in the ‘State’ discussion below.)

Repeat until you’re done. Remember, you can add more properties. Use DataProperties for things classes or instances of classes (programmer reference) and DatatypeProperties for variables (the number 5, an md5sum, etc).

That’s it. When you’re done, export and save it as JSON.

Creating the flow

First you must create a flow node (kinda like the metadata for the flow). Each flow is defined by a flow node with a set of properties:

ID (required) A URI. This is what uniquely identifies this node. It works well to use a UUID prefixed with ‘flow_’ within an organization-specific namespace. For example, org1.com/flows#flow_39ebd2cc-22e3-4287-ab86-2bbba2bbd038. Since this isn’t particularly easy to read, the ’name’ can be used as a human readable name.
Name (required) A string. This can be a truncated form of the UUID (ex “Flow 39ebd2cc”), something more descriptive (“web server attack 1”), or anything else that makes sense for the organization.
Created (required) An ISO standard datetime. Just put the current time here.
Description (optional) A string. Since the UUID probably shouldn’t be particularly descriptive (to be short and avoid conflicts) and the name may not be particularly descriptive, the description is a place to type a one or two sentence summary of the flow. Remember, the details are in the flow itself. This field is likely to be used by folks trying to decide whether to look at the full flow or not.
Author (optional) In Attack Flow 1, this is described as a string. I would recommend instead storing it as a Friend of a Friend (FOAF) person.

Ok, you probably don’t know what FOAF is. The short answer is: someone else has already come up with how to define a person in a repeatable and usable way. You can read it here: http://xmlns.com/foaf/0.1/.

The bigger thing though is that Attack Flow is built on a standard called Knowledge Graphs. Knowledge graphs are about as deep a rabbit hole as you can dive down and we didn’t bring spelunking gear so we’re not going there right now.

Instead, just remember the benefit of knowledge graphs: a lot of people have already defined how to define a lot of things. Here’s a quick list:

FOAF defines people.
VERIS and ATT&CK define many things about, well, attacks.
TIME (http://www.w3.org/2006/time) defines, well, time.
RDF (http://www.w3.org/1999/02/22-rdf-syntax-ns) defines ’types’ (as in ‘server’ is a type of asset)
RDFS (http://www.w3.org/2000/01/rdf-schema) defines labels (basically names)
DC (http://purl.org/dc/elements/1.1/) defines descriptions
OWL (http://www.w3.org/2002/07/owl) defines DatatypeProperties like strings or number, ObjectProperties like people, NamedIndividual (think if ROAF creates a ‘class’ for people, a NamedIndividual is an instance of a class), as well as ‘SameAs’ for when you call the same thing two different names (trust me, it’ll happen)

Each of the above defines TONS of other things. You can ignore anything you don’t actually have to use.

A couple other ones that define useful stuff that you are likely to run into are Dublin Core (DC), Dubline Core Terms (DCTerms), dbpedia.org Ontology (DBO), dbpedia.org Resource (DBR), and schema.org. All of these are namespaces. Even Attack Flow itself is a namespace. Your organization may even choose to create its own namespace to cover things like its assets, people, and buildings. (This would allow them to be referenced in attack flows.)

Ok, back to the actual work.

Creating a step in an Attack Flow

The first think to consider Is every ‘step’ of an attack involves multiple parts. In attack flow, that is four things:

The required state for an action to occur
The action taken
The state it changed in the asset
The asset changed

For example, exploiting a vulnerability in a webserver involves:

Knowledge & access to the webserver
The ability to execute the exploit
The resulting state change on the web server (defacement, Leak data, Install a webshell, etc)
The web server

This may not come naturally. Defenders tend to think in terms of the asset while offense tend to think in terms of the action. Especially for offense, it may be hard to think of what the asset is. There are two tricks to making this easier.

Choose from a list of assets. Pulling form the VERIS framework it might be:
Person
Server
User Device
Terminal or Kiosk
Embedded system
Network
Data

Note that VERIS defines varieties of each asset category.

If using the CARS data model, it might be host subsystems such as:

Network interfaces
Memory
Compute
Storage
Think “What was the effect of the action and what was effected?” Effects can be categorized as:
Confidentiality
Integrity
Availability

or a variety thereof. Using those two tricks, it should be much easier to identify the asset.

Filling in required properties

Within attack flow, each part of the step has properties, some optional, some required.

Lets start with actions:

ID (required) A URI. This is what uniquely identifies this node. It works well to use a UUID prefixed with ‘action_’ within an organization-specific namespace. For example, org1.com/flows#action_e768bf76-77a0-46cb-9e75-92da0af976c7. (So, basically like Flow. And guess what, Asset and ObjectProperties will be the same.)
(required) A string. This can be a truncated form of the UUID (ex “Action e768bf76”), something more descriptive (“web server attack 1”), something not particularly descriptive (“action 1”), or anything else that makes sense for the organization.
Timestamp (required) An ISO standard datetime, but I would recommend using unix time as an integer here. This is actually kinda important. The reason has to do with the order things happen in.

Ok, So there’s two sources of order in Attack Flow. First, Attack Flow is a graph. Actions lead to assets, which lead to more actions. Because of that, you can safely assume that an action that comes from an asset that came from an action is later in the causal path. But what if two assets come from a single action, which then leads to two more actions? Or if two actions happen with no common parent? This is where the timestamp comes it. It is fairly important to the temporal sequence in which things occur.

Now for flows that have actually happened, it’s easy to assign a real timestamp. But what if you are documenting a plan, or a flow where you don’t know the details, or a generic example of an attack? In those cases you don’t have an exact timestamp. Because of this, an integer that can serve as both time since epoc or the temporal sequence of events may be better than an ISO timestamp

Description (optional) A string. Since the UUID probably shouldn’t be particularly descriptive (to be short and avoid conflicts) and the name may not be particularly descriptive, the description is a place to type a short summary of the action. Again, keep it short. No-one wants to read a ton of action descriptions.
Reference (optional, but highly recommended) A string in Attack Flow 1, but I would treat this as a URI. The reason is that your action, (we’ll call it Action_1) is a NamedIndividual (an instance) of a class of action. Reference points to what that class is (For VERIS, the classes would be Hacking, Malware, Social, Misuse, Error, Physical, Environmental.)
Logic_operator (required) A string. If you’re not sure what to put here, use ‘AND’. If you want to know why, read the box below this list.
Logic_operator_language (optional) A string. If you’re not sure what to put hee, use ‘boolean’. If you want to know why, same as logic_operator. Read the box below.
Succeeded (optional) A number between 0 and 1. Probably 1 (i.e. succeeded). Maybe 0 (failed). Potentially something in between. If the action hasn’t happened, maybe just skip this property.
Confidence (optional) A number between 0 and 1. Probably 1 (we know it happened). Probably not 0 (we know nothing about if it happened or not). Maybe something in between. If you’re not sure what to do, just leave it out.

Ok, I promised a box related to logic_operator and logic_operator_language. These don’t mean much if you have a single path that goes something like action->asset->action->asset. But what if you have something that goes action1->asset<-action2? Does that mean both action1 AND action 2 need to happen? Does it mean action1 OR action2 need to happen? That’s where the logic_operator comes in. It defines any logic needed to clarify if the action occurs or not. (So if you’re documenting an attack after-the-fact, you probably don’t need it as much.)

In the case above, the logic operators are Boolean (hense logic_operator_language being Boolean). But what if you have a ton of actions, maybe some various properties (attacker has a credential or software is at some vulnerable version) or need some really complex logic? Well, just make logic_operator a function in the language of your choosing (I’d go javascript or python but that’s just me. Feel free to use rust, ladder logic, or whatever. You do you). Just make sure you indicate the language in logic_operator_language so folks know how to interpret it.

Assets:

ID (required) A URI. This is what uniquely identifies this node. It works well to use a UUID prefixed with ‘asset_’ within an organization-specific namespace. For example, org1.com/flows#asset_3630f3ea-704d-4311-b8fb-530dc47d7c96.
State (optional) A string in Attack Flow 1. But I’d make it a URI referring to a namespace like VERIS which has the attributes: Confidentiality, Integrity, and Availability.
Attack Flow doesn’t lets you keep adding properties. If I were you, I’d include Reference, (maybe to your organization’s asset management tool). And if the tool doesn’t already include a name and description, I’d probably add those too. Just look at how they’re done for action. If those things are in the asset management tool though. Don’t reinvent the wheel.

Yep, another box. This one’s about _state_. Something to keep in mind is that actions change the state of things. If the action doesn’t change anything, it really didn’t do anything. The edge (I know, we haven’t discussed edges) from the action to the asset will list what it changed. But once that change has occurred, it becomes the State property.

This paragraph’s going to get a bit philosophical. Skip it if you want. So when an attack occurs, it changes things. One of the reasons for flows (maybe _the_ reason) is to track those changes. We call the sum total of all those changes “state”. So where is this mythical “state” stored? Well, it’s adding up the set of all the assets involved. That’s simple enough. You can make a table listing the asset in one column then the state in another.

But what if we need the state not just at the beginning and end of the attack. (Hint: at the beginning, nothing has changed.) Amazingly, it’s the same thing. The set of the states of all the asset. The trick, is it’s only up to the time you choose. So you ignore any state_changes caused by Actions with a later timestamp. (The state_change on the edges from those later Actions to Assets should not be included as State properties for the respective Assets yet.) For state_changes that have happened, they should now be State properties on the Assets and part of that list that makes up overall state.

ObjectProperties:

ID (required) A URI. These are easy. They’re just a URI of an object. It might be something like a Nashville which is a location defined in DBR (see above). It might be a person. It might be a reference to a VERIS or ATT&CK object.
Because it’s an object, it can have it’s own properties, but because an ObjectProperty can be anything, there’s no list for what those properties are. If you chose the Object from a namespace, that namespace probably defines it’s properties.

DatatypeProperties:

A value. Yep, that’s it. A value. Datatype Properties just store strings, numbers etc. Sometimes you need those. (Like when you want to note a specific MD5 checksum.)

Relationships (also called edges or triples):

Source (Subject) A URI. Also called the beginning, etc.
Type (predicate) A URI.
Target (object) A URI or a DatatypeProperty. Also called the Destination, End, etc.

Time for a quick confession. Literally everything above is just a relationship. Actions and Flows? They’re just several relationships between the Action/Flows’s URI and the properties. Assets? Triples of <Asset URI, attack_flow:State, State URI. Object properties? <subjectObject (an Action, Asset, Flow, or other ObjectProperty), property name, target object>. Datatype Properties? <subjectObject, property name, value>.

In fact, that’s all knowledge graphs are. Just triples. You can store a graph as just three columns in a spreadsheet. It’s one of the reasons knowledge graphs are both so simple and so powerful.

That said, we have a couple specific relationships Attack Flow uses.

State_change relationships:

Still just a triple, but the Source must have a type of “Action” and the Target must have a type of “Asset”. (We’ll make an exception and let the Source or Target be a Flow too if the Flow is used to represent a bunch of actions/assets, called a subgraph.) It also makes sense to be more specific about what the state_change was. It might be “Compromised”. It might be more detailed: “Confidentiality”, “Integrity”, “Availability”. It’s up to you.

That means an example relationship might be like: <Brute force, Confidentiality, credential1>. You could read this as “The action Brute force was used to compromise the Confidentiality of the credential1”.

State_requirement relationships:

Again, just a triple. But the Source must have a type of “Asset” and the Target must have a type of “Action” (again with the leniency for Flows described in “state_change relationships”). This describes what has to be true to take the action.

Lets say credential1 was compromised as above. The next action might be to use those stolen creds. The triple might look like: <Credential1, Confidentiality, Use of stolen creds> meaning “Credential1 must have it’s Confidentiality compromised for the Use of stolen creds Action to occur.

Connecting Steps Together

Whew. That first part was long. The good news is it’s basically everything you need to know. From here on out, you’re connecting those steps you defined together to form the full flow. This should be pretty straight forward if you just have a single path. Keep connecting Action to Asset to Action. To Asset until you are done.

Where you may get confused is when your flow branches or converges. Branching is probably easier. It may be that the compromise of one Asset leads to multiple actions. (Think compromising a server, exfiltrating the data on it, then encrypting it for ransomware.) Simply create the two Actions, draw a relationship from the asset to each of the actions, and make sure to set the Action times such that they happen in the temporal order they happened in. This also works for an Action that compromises multiple Assets, however that’s going to be less common since usually it’s multiple NamedIndividuals of a single action, repeated once per asset. Still, it can happen and you have a lot of discretion in how to code it.

It’s also possible for multiple compromised Assets to be required for a single Action, or multiple actions to be taken against a single Asset. For example, the Action of SQLi on the DMZ webserver database, may require the DMZ webserver be in a compromised State, and have compromised the Confidentiality of the database credentials. Alternately, it’s incredibly common for multiple Actions to be taken against a single Asset. In fact many Actions may go from the Asset back to the same Asset. Take for example, using a driveby exploit Action to compromise a user account on a desktop asset. Then using mimikatz (a malware Action) to compromise privileged credentials (a data Asset) and using them to elevate privileges on the same desktop Asset. The desktop Asset would have to actions coming in: The driveby exploit and the use of the stolen credentials.

Finishing the Flow

So you’ve done all that. You should now have a SPARQL query, a JSON-LD graph, or a json schema file. You can keep them as flat files or potentially insert them into a database, (graph, JSON, etc). Unfortunately it’s not really the scope of this document of discuss actually using yoru flows. Still, it’s wholly possible to use them for documenting Red Team tests (both the plan and the test report), Attack Simulation (though this is a little more complicated), Detection Signatures, Threat Intelligence, Incident Response, Architecture engineering and planning, and Executive Communication. See the additional documentation as it becomes available for details of these use cases. (Or watch the bsidesLV Presentation on it: https://www.youtube.com/watch?v=NwSd6tAA-eI.)

4 - SPARQL

How to write the SPARQL queries that are important for using Flow Manager.

Just a foreword, SPARQL, like any query language, can be picky. http://www.sparql.org/ has several validators you can use to find errors in your queries.

So SPARQL is kinda like SQL for graphs. Frankly, it’s kinda old compared to Cypher, Gremlin, or Storm bug it has some properties that make it nice for what we’re doing.

Your first SPARQL query

Lets start with something simple and break it down.

select * where { 
  ?s ?r ?d .
} limit 10

This query returns 10 triples from the database. select * - Just return whatever is found. We could have returned specific variables.
where { ... } - filter out all the triples that are true under all the statements between the curly brackets
?s ?r ?d . - The things with a ? in front of them are variables. In the Knowledge Graphs introduction, we talked about how knowledge graphs are composed of triples. SPARQL centers around triples as well. We’re calling the variables ?s for source, ?r for relationship and ?d for destination here, but they can be anything. You’ll often see ?s ?p ?o for subject, predicate, and object. The . at the end means “this is the end of the line”. We’ll show what the alternative to everything on one line is a little later.
limit 10 - Only return 10 items.

{
  "head": {
    "vars": [
      "s",
      "r",
      "d"
    ]
  },
  "results": {
    "bindings": [
      {
        "s": {
          "type": "uri",
          "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
        },
        "r": {
          "type": "uri",
          "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
        },
        "d": {
          "type": "uri",
          "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "http://www.w3.org/2000/01/rdf-schema#subPropertyOf"
        },
        "r": {
          "type": "uri",
          "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
        },
        "d": {
          "type": "uri",
          "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"
        }
      },
      ...
    ]
  }
}

Something to note is what the output is and isn’t. Because SPARQL can return more than just IRIs and triples, the return is lines of text. Now most tools can convert it back to something more easily parsable such as CSV, JSON etc. Flow Manager returns the data as a json string, thought not a graph. The data can then be reassembled to triples (s, r, d) that then can be joined to recreate the graph.

your first useful SPARQL query

This query is going to produce a table of flows

PREFIX af: <https://attackflow.space/attack-flow#>
select DISTINCT ?flow ?name ?description ?author where { 
    ?s af:flow ?flow .
    ?flow af:name ?name .
    ?flow af:description ?description .
    ?flow af:author ?author .
} limit 10

Lets break this one down. PREFIX af: <https://attackflow.space/attack-flow#> This is something new. We discussed namespaces on the Knowledge Graphs page. Namespaces get used over and over so to simplify them, we can create a PREFIX command at the beginning of the SPARQL query and only use the short string (in this case ‘af’) in our query. (P.S. You can have as many PREFIX lines as you want. It’s usual to to have several common PREFIX lines such as RDF, RDFS, OWL, etc) select DISTINCT ?flow ?name ?description ?author Return four values per row. (Note the DISTINCT keyword. It does the obvious thing: only returns unique lines.) ?s af:flow ?flow . This requires that a node (?s) points to the node ?flow with an af:flow edge. (We did it this way so there’d only be one prefix, but if you were OK with two prefixes, this line could easily be ?flow RDF:type af:flow .)

    ?flow af:name ?name .
    ?flow af:description ?description .
    ?flow af:author ?author .

THe next three lines identify multiple variables per flow (?name, ?description, and ?author). NOTE! This does not mean you’ll get one line per flow. If a flow happened to have multiple names, authors, etc it could potentially be on multiple lines.

So what does it return?

{
  "head": {
    "vars": [
      "flow",
      "name",
      "description",
      "author"
    ]
  },
  "results": {
    "bindings": [
      {
        "flow": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        },
        "name": {
          "type": "literal",
          "value": "Test Flow"
        },
        "description": {
          "type": "literal",
          "value": "A test flow"
        },
        "author": {
          "type": "literal",
          "value": "Gabriel Bassett"
        }
      },
      {
        "flow": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_6f471cb3-1fc0-4fd6-8d9b-7725af56b442"
        },
        "name": {
          "type": "literal",
          "value": "DDoS Template Flow"
        },
        "description": {
          "type": "literal",
          "value": "A prototypical DDoS Attack"
        },
        "author": {
          "type": "literal",
          "value": "Gabriel. Bassett"
        }
      }
    ]
  }
}

Finding the Actions and Assets in a Flow

Because all actions and assets in a flow point to the flow node, it serves as an easy way to query for them:

PREFIX af: <https://attackflow.space/attack-flow#>
select DISTINCT ?s ?o ?p where { 
    ?s ?o ?p .
    FILTER (?o = af:flow)
    FILTER (?p = <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97)
} 
ORDER BY ?s 
LIMIT 100

resulting in

{
  "head": {
    "vars": [
      "s",
      "o",
      "p"
    ]
  },
  "results": {
    "bindings": [
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#action_653f8cfb-dce0-40b9-89af-1af9c4172340"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#flow"
        },
        "p": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Asset_fda2b7a6-3904-4f78-ab1d-0ad7b1b6a1ed"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#flow"
        },
        "p": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        }
      }
    ]
  }
}

You probably have got most of the lines of this at this point so lets skip to the interesting part: ?s ?o ?p . As you now know, this selects all triples FILTER (?o = af:flow) Here we we say we only want flows where ?o is af:flow, so ?s af:flow ?p FILTER (?p = <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97) adds another filter, resulting in something like ?s af:flow <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97>. order by ?s just arranges the results

In fact, you can create that query:

PREFIX af: <https://attackflow.space/attack-flow#>
select DISTINCT ?s where { 
    ?s af:flow <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97> .
} limit 100

Except because only ?s is a variable now it’s the only thing returned:

{
  "head": {
    "vars": [
      "s"
    ]
  },
  "results": {
    "bindings": [
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#action_653f8cfb-dce0-40b9-89af-1af9c4172340"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Asset_fda2b7a6-3904-4f78-ab1d-0ad7b1b6a1ed"
        }
      }
    ]
  }

Adding the relationships between actions/assets to the actions, assets, and flows

We now have two triples, one that points from an action to a flow and one from an asset to the flow. But we don’t have the relationships between them. Lets add that:

PREFIX af: <https://attackflow.space/attack-flow#>
select DISTINCT ?s ?o ?p where { 
    {
        ?s ?o ?p .
        FILTER (?o = af:flow)
        FILTER (?p = <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97>)
    } UNION {
        ?s ?o ?p .
        ?s af:flow <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97> .
        ?p af:flow <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97> .
    }
} limit 100

The only real addition here is the UNION line. We’ve created two sets of filters, one that gets us (action/asset af:flow the_flow) and one that gets us (action/asset relationship action/asset). (The second filter does this by starting with all triples, then filtering to only the ones where the source and destination of the triple point to our specific flow of interest with a af:flow edge.)

Now in addition to the triples from before, we have a triple from our action to our asset.

{
  "head": {
    "vars": [
      "s",
      "o",
      "p"
    ]
  },
  "results": {
    "bindings": [
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#action_653f8cfb-dce0-40b9-89af-1af9c4172340"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#flow"
        },
        "p": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Asset_fda2b7a6-3904-4f78-ab1d-0ad7b1b6a1ed"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#flow"
        },
        "p": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#action_653f8cfb-dce0-40b9-89af-1af9c4172340"
        },
        "o": {
          "type": "uri",
          "value": "http://attackflow.space/veris#Availability"
        },
        "p": {
          "type": "uri",
          "value": "urn:absolute:flows#Asset_fda2b7a6-3904-4f78-ab1d-0ad7b1b6a1ed"
        }
      }
    ]
  }
}

Lets get a bit meta. There’s a sort of Grammar for manipulating data. Things like “select”, “filter”, “modify”, “arrange”, “summarize”, and “join” and it applies to graphs too. They form a sort of command progression: start with the dataset -> filter stuff out -> select just the variables we need from what’s left -> arrange it. (In fact this command progression is ridiculously common across all types of data.)

In SPARQL we start with our full dataset. We use each line in ‘where’ to filter until we have just the parts we need (filter). our SELECT line determines which variables we return. and our ORDER BY determines the order they are returned in.

It’s also common to add in UNIONs to build groupings which don’t logically result from a single filter.

While we didn’t cover other things like modify, SPARQL supports it. It’s just not something you’re likely to use right away, except maybe to remove some data unintentionally stored. More than likely if you need to modify data, you’ll run a query, create the updates outside the database (such as by enriching the data) and adding the new edges back in using a (create)[docs/Concepts/sparql_create.md] statement.

We can put this all together in a query to retrieve an entire flow and all of it’s properties:

PREFIX af: <https://attackflow.space/attack-flow#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select DISTINCT ?s ?o ?p where { 
    {
        ?s ?o ?p  . 
        ?s af:flow <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97> .
    } UNION {
        ?s ?o ?p .
        FILTER (?s = <urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97>)
    }
}

The first WHERE block gives us all triples that start with a node in our flow (our actions and assets) including their properties. The second where block gives us all triples starting in the flow node itself (giving us the flow’s properties).

Together this returns:

{
  "head": {
    "vars": [
      "s",
      "o",
      "p"
    ]
  },
  "results": {
    "bindings": [
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#action_653f8cfb-dce0-40b9-89af-1af9c4172340"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#flow"
        },
        "p": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#action_653f8cfb-dce0-40b9-89af-1af9c4172340"
        },
        "o": {
          "type": "uri",
          "value": "http://attackflow.space/veris#Availability"
        },
        "p": {
          "type": "uri",
          "value": "urn:absolute:flows#Asset_fda2b7a6-3904-4f78-ab1d-0ad7b1b6a1ed"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Asset_fda2b7a6-3904-4f78-ab1d-0ad7b1b6a1ed"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#flow"
        },
        "p": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        },
        "o": {
          "type": "uri",
          "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
        },
        "p": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#flow"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#created"
        },
        "p": {
          "type": "literal",
          "value": "2022-08-25T17:02:45.423Z"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#name"
        },
        "p": {
          "type": "literal",
          "value": "Test Flow"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#description"
        },
        "p": {
          "type": "literal",
          "value": "A test flow"
        }
      },
      {
        "s": {
          "type": "uri",
          "value": "urn:absolute:flows#Flow_605eeda8-2628-4897-b68c-e54b144d4e97"
        },
        "o": {
          "type": "uri",
          "value": "https://attackflow.space/attack-flow#author"
        },
        "p": {
          "type": "literal",
          "value": "Gabriel Bassett"
        }
      }
    ]
  }
}

This is certainly not a comprehensive look at SPARQL. For that there’s the documentation. But hopefully this is enough to get you started as you look to retrieve data from Flow Manager.

5 - SPARQL CREATE

How to create flows using SPARQL queries.

Just a foreword, SPARQL, like any query language, can be picky. http://www.sparql.org/ has several validators you can use to find errors in your queries.

Inserting data into Flow Manager is actually relatively easy using SPARQL. You might find, in fact, that it’s easier to write flows in SPARQL than other formats, especially for longer flows.

The following example is the start (the flow node, first action, and first asset) of a larger flow that highlights how SPARQL can be used for writing flows:

PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX af: <https://attackflow.space/attack-flow#>
PREFIX flows: <urn:absolute:flows#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX veris: <https://attackflow.space/veris#>
INSERT DATA 
{
    flows:example_flow4 rdf:type owl:NamedIndividual, af:attack-flow ;
                      af:created "2022-04-12T13:52:00" ;
                      af:name "example_flow4" .
    flows:action_65f9c664 rdf:type owl:NamedIndividual, af:action,  veris:Use%20of%20stolen%20creds ;
               veris:action.hacking.vector veris:action.hacking.vector.Desktop%20sharing%20software ;
               af:description "actor logs in to a desktop using RDP & stolen creds" ;
               af:logic_operator "AND" ;
               af:name "action1" ;
               af:flow flows:example_flow4 ;
               af:timestamp "1970-01-01T00:00:01" ;
               af:state_change flows:asset_1024b648  .
    flows:asset_1024b648 rdf:type owl:NamedIndividual, af:asset, veris:Desktop ;
               af:flow flows:example_flow4 ;
               af:state_requirement flows:action_e084e1f3, flows:action_3055c8f7-1,  
               						flows:action_3055c8f7-2, lows:action_3055c8f7-3, 
               						flows:action_3055c8f7-4, flows:action_3055c8f7-5, 
               						flows:action_3055c8f7-6, flows:action_3055c8f7-7, 
               						flows:action_a857f6c3 .
}

Instead of the SELECT query we used to retrieve data, we use INSERT DATA. We can effectively write each node as the edges going from it to other nodes or literals.

flows:example_flow4 rdf:type owl:NamedIndividual, af:attack-flow ;
                  af:created "2022-04-12T13:52:00" ;
                  af:name "example_flow4" .

For the flow node we can see the first value of the triples is flows:example_flow4. But on the first line we see something curious. It’s not three IRIs long. Instead we see a , between the last two IRIs. The , effectively means “we’re going to have another triple where the first two IRIs are the same.” The line (and in fact the second line as well) ends with a ;. The ; means “we’re going to have another triple where the first IRI is the same.” You can almost think of it as creating a tree (like JSON) that’s always three values deep. Finally once we’re done with the triples for this node, we use . to indicate the end of the line. You don’t have to use , and ;. You could write each triple separately (ending each line in .) or repeat the the second and third IRI each time (ending in ;), but the short hand speeds things up

We then repeat the same process for the next node (flows:action_65f9c664), the following asset (flows:asset_1024b648) and on down the attack graph.

It’s fast, easy, and clean.

6 - SPARQL UPDATE

How to update flows using SPARQL queries.

Just a foreword, SPARQL, like any query language, can be picky. http://www.sparql.org/ has several validators you can use to find errors in your queries.

Some times you may find the need to update your flow graph. This can be done with the Flow Manager Modify API. Modifications come in the form of INSERT or DELETE queries.

For example, if we previously added an Unidentified author to a flow, we may want to replace them with a named author. We may know exactly what we want to change and so fully define the triples to change:

INSERT DATA 
{ 
  <urn:absolute:flows#flow--41e0cd93-6fb2-4786-94ab-5adec21960cc> <https://attackflow.space/attack-flow#author> <urn:absolute:flows#person--4c49da73> .
  <urn:absolute:flows#person--4c49da73> a <http://xmlns.com/foaf/0.1/Person>, owl:NamedIndividual;
      <http://xmlns.com/foaf/0.1/firstName> "Gabriel";
      <http://xmlns.com/foaf/0.1/family_name> "Bassett";
      <http://xmlns.com/foaf/0.1/workplaceHomepage> "http://infosecanalytics.com" .
}

and

DELETE WHERE 
{ 
  <urn:absolute:flows#flow--41e0cd93-6fb2-4786-94ab-5adec21960cc> <https://attackflow.space/attack-flow#author> <urn:absolute:flows#Unspecified> .
}

But in many cases, you’ll want to replace it in a more general way, using variables. Note, we’re going to change the ‘INSERT DATA’ to ‘INSERT’ since we’re using variables.

DELETE {
      ?flow <https://attackflow.space/attack-flow#author> <urn:absolute:flows#Unspecified> .
}
INSERT { 
  ?flow <https://attackflow.space/attack-flow#author> <urn:absolute:flows#person--4c49da73> .
}
WHERE {
      ?flow <https://attackflow.space/attack-flow#author> <urn:absolute:flows#Unspecified> .

}

For more about what you can do with insert and delete commands, see the W3C Documentation.