What’s wrong with functions and how graphs will bury them
Functions are the most fundamental building block in programming. They are behind every software application and system that run our day to day lives. It is the very first thing everyone learns in order to become a programmer. Isn’t it presumptuous to question the very foundation of programming? In the next few minutes, I will explain to you why functions are so fundamentally broken, and why graphs are the future for building modern software.
Why do we need functions in the first place? The primary purpose of functions, as a programming construct, is to support encapsulation and composition. Encapsulation refers to packaging a block of code with a clearly defined API so that it can be easily reused as a single unit. In addition, encapsulation allows the underlying implementation to change without affecting the clients. Composition allows developers to create bigger and more complex software by combining and re-using smaller components. Both encapsulation and composition are critical ingredients in modern software engineering. However, functions only offer a half-baked solution.
The mechanism behind function encapsulation has evolved over time. In the earliest programming languages, functions didn't enforce strong encapsulation due to the use of global variables and goto statements. As software grew more complex, stronger enforcement of encapsulation is needed, which led to the rise of modern programming paradigms such as structured, object oriented, and functional programming.
The main problem with function encapsulation is that it creates black boxes, in that a function’s run time state is not easily accessible. After a function returns, its call stack and all the intermediate values are forever gone, there is no way to retroactively inspect them. On the other hand, the runtime information is absolutely crucial for understanding the behavior of any software. It is simply impossible to reason about a function’s behavior (except for the simplest ones) by just looking at its source code, because of the complex and dynamic modern languages features, such as loops, branches, and polymorphic function calls. To access a function’s runtime information, a developer has to use a debugger to step through the source code while the function is running, or modify the code with extensive logging then analyze the logs; both approaches are extremely slow and painful. You may think being blackboxy is a necessary or even beneficial byproduct of strong encapsulation, which is not true. The “information is easily accessible, but can be ignored if you don’t care” is always a better paradigm than the “information is not accessible unless you log or use a debugger”. The current form of function encapsulation unfortunately only offers the latter.
In comparison, the mechanism of function composition hasn’t changed at all since the dawn of programming. Function composition is always implemented by having one function to call another, and execute the body of the callee in the same process space of the caller. Our lives would be easy if arbitrarily complex software could be created by simply nesting function calls. However in reality, it doesn’t work like that. Here is why:
Function composition does not work across the process boundary. By construction, the caller and callee functions have to run in the same process space. When a function is too big to execute by a single process on a single computer, we have to manually break it into smaller ones and distribute them among multiple processes, and manage their synchronization using dedicated cross process communication APIs, such as remote procedure calls, message passing, REST or microservices etc. Such re-engineering cannot be done automatically by a compiler because it has no access to the runtime information, which is required to break up the workload in a sensible manner. Therefore, as soon as a software outgrows the boundary of a single process, the simple mechanism of function composition stops working. Instead, manual re-engineering of the code base has to take place in order to scale it across multiple processes, which dramatically increases its cost and complexity. This is one of the main reasons why large enterprise software systems are so much more difficult and costly to develop and support.
Function composition does not support lineage and explainability, which are of critical importance to modern enterprise systems, especially in a regulated industry. For example, when a bank makes credit approval or rejection decisions, it has to be able to explain to a regulator why certain decisions are made and whether there are systemic biases against protected attributes such as gender or races. A debugger is certainly not the answer to the lineage and explainability challenges. Instead, developers have to rely upon extensive logging and log analysis to support lineage and explainability, which is slow, costly and error prone.
Function composition doesn’t facilitate automatic runtime optimizations. Consider a simple example: when an expensive function is called repeatedly with the same argument at runtime, most programming languages, including the pure functional ones, can’t automatically optimize it by caching its results, because it does not know whether the function will be called again in the future with the same argument. This is because function composition does not track the runtime dependencies between function calls. Developers therefore have to resort to manual and bespoke code changes in order to benefit from such optimizations.
Function composition does not support queries. For example, a polymorphic function may have multiple implementations for different argument types. We can’t query which version of the implementation is actually used in producing a certain result, this valuable information is simply discarded by function composition. Compared to the extensive query capability in modern databases, it is appalling that there is no query capability over a function’s runtime execution. Imagining the possibilities if we could jointly query the data and logic of a function’s runtime state, it could allow AI to learn from code in addition to data, and open up enormous opportunities.
I can add many more to the drawback list of functions, such as poor scenario support and the lack of data security/privacy enforcement; but you should get the obvious fact by now: Functions fall short in realizing the full potential of encapsulation and composition. Chances are that you already suffer from the problems above, but you may not realize their root cause is in functions, the most basic building block of programming. So what shall we do instead?
The answer is graph computing, which represents the entire software logic as a computational DAG (directed acyclic graph). Large computational DAGs are created by simply composing smaller and modular DAGs. Instead of going deep into the technical details, here I just highlight the key differences between function composition and graph composition with a simple example.
# Function Composition def top(x) : return a(x) + b(x) def a(x) : return base(x) + 1 def b(x) : return base(x) * 2 def base(x) : return x top(3)
Both function and graph implementations offer strong encapsulation with an easy API. The client doesn’t need to know the implementation details of the top(x) algorithm. When the function top(3) is called using function composition (on the left side), it calls all the dependent functions in the same process and returns the final result to the caller. Once the function returns, all the intermediate results are lost and forever gone. In graph composition (on the right side), an entire computation graph is automatically constructed when the user asks for the output of top(3), then the graph is executed by traversing the nodes and all the intermediate and final results are computed and stored in the graph, which is available for inspection and query even after its execution. The entire lineage of the calculation is also automatically tracked by the graph, making it easy to debug, explain and audit the results.
The graph composition on the right side automatically works across the process boundaries, there is no restriction that all the nodes have to be executed within the same process. A generic algorithm can automatically and efficiently distribute any computation DAG to multiple processes and computers, making it effortless to create large distributed systems. Graph composition also enables sophisticated runtime optimizations, for example base(3) is only computed once as it is automatically recognized as a common dependency in the graph. In comparison, it has to be executed twice in the function composition.
Graph computing and composition are not new ideas. Existing software packages such as Dask and Tensorflow already let users create and run computational graphs. However, the sad irony is that these existing solutions all choose to implement graph composition via function composition, i.e, developers have to write their business logic in regular functions, then the computational graph is created by running these functions and recording their runtime states. Doing so completely defeats the purpose of graph composition. The great promise of graph composition is to address the severe limitations of function composition; we simply cannot achieve that goal if function composition remains a critical part of the solution.
We, at Julius Technologies, have developed an innovative graph computing solution that does not rely upon function compositions at all. Our solution is based on a low-code domain specific language (RuleDSL), that is specifically designed for creating computational graphs and facilitating graph compositions. Computational DAGs can be directly created and composed from Julius RuleDSL, without having to go through any function composition. Julius graph computing platform is therefore able to deliver the full potential of graph composition, including low-code, auto-scaling, transparency, lineage, high performance, joint query of data and code, and a lot more.
Is there still a place for functions once we adopt graph computing? Well, functions are the most efficient construct for writing common low level algorithms, such as sorting, search and numerical solvers, etc. These low level algorithms are easy to understand and reason about and they only run within the boundary of a single process. Therefore, there is little reason to rewrite these low level algorithms as graphs. Existing low level functions can be easily reused in graph composition by wrapping them inside individual nodes of the computational graph.
By burying functions inside individual nodes of computational graphs, we get the best of both worlds of graphs and functions, allowing us to finally enjoy the full power and benefits of encapsulation and composition. Intrigued by the promise of the new graph programming paradigm? Check out this video introduction for more details.