How I cope with model of 400 elements, 1200 relations that mutates to be correct

maksim aniskov · May 19, 2024, 06:49:04 AM

Hi all!

I want to share my story about how I deal with a model representing a real complex data processing setup that constantly mutates. I have been on a challenge of building the model and automation around in course of past two months as a part-time assignment. Now I'm having solid proofs that my setup flies flawlessly and is saving me and the team a ton of time and effort each day. I'm very much proud of it!

My situation is that there is a team of approximately ten data engineers and devops who are full-time busy on improving or modifying the setup. A whole bunch of Apache NiFi dataflows running on several AWS EC2 instances comprise the setup. They pull data from several dozens of external APIs, then fan out processed data to downstream consumers, as well as data storage services. The whole setup is undergoing migrating from one AWS region to another. Those APIs live their own life, sometimes changing their endpoint location, or getting decommissioned. Or a new API appears.

How do I make sure that my architecture diagrams always correctly represent all that pretty complex "reality"? And how do I sleep well spending just 10 minutes a day fixing drift, when it gets detected?

The keyword here is "drift detection". Let me describe my toolchain.

Data harvesting. I use two sources to fetch all necessary data about "the real world": AWS CLI output and NiFi flow definition JSON files.

Data preparation. Before feeding the data to model validation scripts, I prefer to transform it to comma- or tab-separated text files. Making AWS CLI to produce such output is not a problem at all. Retrieving necessary data from NiFi flow definitions is the trickiest part of the challenge. Those JSON files that the data harvesting script retrieves from the servers, are around two million lines in total, if pretty-printed. I chose jq to do all the heavy-lifting, looping through objects, looking up for NiFi processor group and processor descriptions, and matching parameter substitutions in expressions against values in context definitions. What jq passes output are two comma-separated files. One maps NiFi root process group names to ids of AWS EC2 instances the groups run on. The second file lists all data sources and data sinks the process groups have in configuration of their data processors. What I'm interested in first of all is to know what kind of database, data streaming service or API is the NiFi processor's peer, and what is its URL, address or identification. I want to know all the ins and outs of data processing setup under analysis.

Validating model correctness. For marching my ArchiMate model against "the real world" data, I employ archi-powertools-verifier. The tool leverages Docker Compose to orchestrate Archi, Neo4j and sqllite containers.
The basic idea is pretty straightforward. A model validation batch does the following.

Convert content of ArchiMate model to set of comma-separated files;
Run SQL database engine (sqlite) or graph database engine (Neo4j);
Make it import model files and any additional data we have as result of data preparation we did before;
Run the set of SQL or CQL (Neo4j) queries I wrote. The queries compare what is in the tables (or elements and relationships, if Neo4j) which contain the model against what is those which contain the "real world" data. Read this for mode details and examples.

So, once a day my automation runs data harvesting, data preparation and model validation scripts. If my SQL and CQL scripts detect any discrepancies between the architecture model and AWS infrastructure or NiFi flow definitions, I get Drift detected! lines in the output accompanied with necessary problem descriptions and resource names or ids. To bring the architecture model back to state when it correctly represents the NiFi flows, I edit the model manually in Archi. Before committing my changes to git, I validate the model one more time. When there are no more drift warnings in the output, I'm ready to push the updated model to the repository to make it available to other collaborators. Doing this process again and again routinely, I have already collected statistics that our NiFi setup gets 2 to 4 changes requiring mirroring in the model each week. When drift is detected, it normally takes 10 or 15 minutes to edit, re-validate, commit and push changes. And I'd say that those several minutes spent is nothing if we consider that that is all the cost of making it guaranteed that the system architecture model is fully correct 100% of time.

maksim aniskov · May 21, 2024, 18:45:03 PM

Bonus #1: Model linting
I can use the same technique and same instruments to validate internal properties of architecture models against a set of rules expressed in SQL and CQL queries. For example, my automation checks that if a NiFi processor group is presented on a ArchiMate view, the view also contains all model elements which are data source or data sink for this processor group. Otherwise, the view would mislead me and others about ins and outs of this processor group. Check out this example of SQL queries which implement such sort of verifications.

Phil Beauvoir · May 21, 2024, 18:48:35 PM

Hi,

thanks for sharing this with the Archi and ArchiMate community! I'm sure everyone will find this very useful. :-)

Phil

maksim aniskov · May 22, 2024, 18:04:39 PM

I love Archi! Incredible tool! This is why I invested quite a portion of my free time to develop archi-powertools. The tools pay off! Today, as business-as-usual, archi-powertools-verifier notified me that a new connection to yet another Kafka had just been added by the developers to that setup I'm describing in the topic. It took me five minutes to think carefully which of the views on the model best represent the new stuff, do all beauty work, write commit message, and click "OK". (The HTML report gets generated and published automatically.) Including two minutes wasted as I initially made the Flow relation go wrong direction (oops!). The verifier spotted the problem. +1 element and +1 relation in the model, 100% guarantee that the model is in perfect shape.

maksim aniskov · May 28, 2024, 17:30:03 PM

Bonus 2: Advanced analysis of the model
Same tooling also lets me perform more advanced analysis of the application.

What if I want to know whether the data processing setup moves data between two AWS regions?

Before explaining how I easily answer questions like this, I need to explain what is in the system's architecture model. The model's elements of "logical type" NiFi Root Processor Group have no direct relation to the AWS region. In the model the processor group is associated with an EC2 instance which the group runs on. In its turn, the EC2 instance gets aggregated by the AWS region.
NiFi Root Processor Groups can be in different relations to different types of external services depending on the kind of those relations in "reality". As the following diagram says about one of the NiFi elements, viewed in isolation, the processor group emits data to the AWS SQS queue in the AWS region. (And the SQS's region is different from the EC2 instance's one.)

So, let's try to find all similar cross-region data paths.

I use archi-powertools-verifier with its interactive mode feature. I make the tool import my model to Neo4j absolutely the same way model verification does. In interactive mode the tool leaves Neo4j to run in the background and enables Neo4j Browser, a tool that allows you to execute CQL queries and visualise the results.

First, I need to enrich the original relationship graph with a new direct relation between NiFi processor groups and their AWS regions. (All that terminology used in the query below, e.g. elements of type Grouping of given specialization, or relations of Aggregation type, came from ArchiMate. MERGE clause of the query makes the query create a new relation in the graph while merging possible duplicates.)

MATCH
  (r:Grouping{specialization:"AWS Region"})-[:Aggregation]->
  (:TechnologyService{specialization:"EC2 Instance"})
  <-[:Association]-(n:ApplicationService{specialization:"NiFi"})
MERGE (n)-[:IN_REGION]->(r)

Another enrichment I want to make is to create a "generic" INTERACT_WITH relation between NiFi processor group and its data source or data sink regardless of type and direction of the original relation. (In my situation all data sources and sinks get represented as ArchiMate elements of type TechnologyService, TechnologyInterface or Artifact.)

MATCH
  (n:ApplicationService{specialization:"NiFi"})--
  (s:TechnologyService|TechnologyInterface|Artifact)
MERGE (n)-[:INTERACTS_WITH]->(s)

After I kind of "normalised" the model by introducing those two new "artificial" types of relations, I can use following fairly simple CQL query to find all NiFi processor groups (n) that INTERACTS_WITH any external service (s) and n's and s's regions are different.

MATCH
  (r1)<-[r_n:IN_REGION]-(n)-[n_s:INTERACTS_WITH]->(s)
  <-[s_r:Aggregation]-(r2:Grouping{specialization:"AWS Region"})
WHERE r1<>r2
RETURN r1, r_n, n, n_s, s, s_r, r2

Neo4j Browser does a beautiful job (in literal sense) of graphically presenting the sub-graph my query returns.

Alberto · May 28, 2024, 18:03:27 PM

This is the stuff I've been dreaming about ever since I discovered Archimate. Of course, in my dreams, every vendor would support Archimate as a native export so that you could have multiple sources and not have to recreate the wheel for every component of the enterprise. Alas, dreams can become reality, I just hope to be in a position again to do this kind of work. Congrats!

rchevallier · June 11, 2024, 15:12:40 PM

Excellent work @maksim aniskov !
a graph query language like CQL directly supported in Archimate (so no need to export, and no need to deploy another component) would be heaven.
I've toyed myself with some JS api to facilitate graph walking / querying in jArchi, but nothing really elegant and powerful enough so far

maksim aniskov · June 11, 2024, 16:50:18 PM

Quote from: rchevallier on June 11, 2024, 15:12:40 PMExcellent work @maksim aniskov !

Thank you @rchevallier !

Quotea graph query language like CQL directly supported in Archimate (so no need to export, and no need to deploy another component) would be heaven.
I've toyed myself with some JS api to facilitate graph walking / querying in jArchi, but nothing really elegant and powerful enough so far

Having such a feature supported directly would be really nice. If the feature existed, it would definitely cover cases I described above as Bonus #1: Model linting and Bonus 2: Advanced analysis of the model. I rather doubt about case archi-powertools-verifier is primarily designed for: comparing a model against real system or infrastructure.

Alberto · June 12, 2024, 15:59:31 PM

Quote from: rchevallier on June 11, 2024, 15:12:40 PMa graph query language like CQL directly supported in Archimate (so no need to export, and no need to deploy another component) would be heaven.

An alternative implementation would be leveraging RDF/SPARQL in similar fashion to the relational tables & AlaSQL JS browser embedded in the HTML Report. But to Maksim's point, it all depends on the use cases.

Alexis_H · June 15, 2024, 09:50:10 AM

Thank you very mutch Maksim for sharing this work.
I Will definitly give it a try, looks very promising.

maksim aniskov · June 15, 2024, 16:36:33 PM

Quote from: Alexis_H on June 15, 2024, 09:50:10 AMThank you very mutch Maksim for sharing this work.
I Will definitly give it a try, looks very promising.

I'm curious to know about your case. Drop me a line.

Alexis_H · June 17, 2024, 06:54:31 AM

Quote from: maksim aniskov on June 15, 2024, 16:36:33 PMI'm curious to know about your case. Drop me a line.

Our use-cases resonate with yours : our team of architects collaborate on a common archi model (+10K concepts) which is in synch on the tech/app layer with 'the real world' (via a weekly semi-auto synch with the CI model of SNow CMDB - mapped to on-prem DC + AWS).
- We struggle with duplicates and different flavors of modelisation styles (say: application flows)
> I need a control loop to enforce modelisation rules/styles
- I may challenge our choice of synching with SNow CMDB in favor of direct AWS CLI mapping (for cloud natives apps)

Regards,
Alexis

maksim aniskov · June 17, 2024, 17:57:06 PM

Quote from: Alexis_H on June 17, 2024, 06:54:31 AMOur use-cases resonate with yours : our team of architects collaborate on a common archi model (+10K concepts) which is in synch on the tech/app layer with 'the real world' (via a weekly semi-auto synch with the CI model of SNow CMDB - mapped to on-prem DC + AWS).

That's massive! I'm intrigued to know how performance of archi-powertools-verifier tooling is at this scale. Especially, how much time archi (importing from git) stage takes.

Quote- I may challenge our choice of synching with SNow CMDB in favor of direct AWS CLI mapping (for cloud natives apps)

Syncing with the CMDB should also work out no problem. Maybe, I do not get your concern.

Quote- We struggle with duplicates and different flavors of modelisation styles (say: application flows)
> I need a control loop to enforce modelisation rules/styles

Would it help if you make archi-powertools-verifier apply different predicates (SQL/CQL queries) to different views or folders of your model by adding WHERE conditions to the queries?

Alternatively, you can implement model federation instead of keeping everything in your mega-model. Without archi-powertools-verifier, I'd rather not risk dealing with a multitude of decentralized models. But archi-powertools-verifier makes it possible. I describe the approach in this example.

In a nutshell, idea is to break a mega-mono-model to a set of models, each of which may follow its own rules and approaches. But, we agree that some of the views (or folders) should be synchronized, and we leverage archi-powertools-verifier as implementation of synchronization process. By keeping those "interface" views/folders harmonized among the parties, we propagate information about changes happening across the whole estate.