Our objective is to be able to extract as much information as possible in this domain, that’s why we’re doing this and much, much, more…

Definitions: Model - contains the full description containing:

  • src_code - source code of the file/module that the model corresponds to
  • formalization_status - level of formalization achieved for this model
  • dependencies - list of models (references to the respective objects) on which this model depends on
  • dependendency_context - aggregated formal representation (in IML) of the underlying dependencies that will be used in the course of formalizing the current model.

MetaModel - a container for a set of models pertaining to a specific project directory (or “repository”) implementing functions for their management (insertion/deletion/update)

Interdependence: - By interdependence, we mean that models “import” symbols (i.e. classes, modules, functions, constants) from other files and 3rd party libraries. “By default” we can formalize a model that contains references to other project files without adding any details about their actual implementation, but doing so will force us to consider those symbols as ‘opaque,` limiting the type of reasoning we can perform on the model and

Formalization: the process of formalization relies on using the CodeLogician Agent to translate source code along with generated dependency context into an IML model and reason about it (producing relevant artifacts, e.g. test cases).

Levels of formalization: Unknown - no formalization has been performed Error during validation - when admitting the model (in IML), ImandraX returns an error either representing inconsistent types used, missing symbols, non-terminating functions, etc. Admitted_with_opaque - model has been admitted but Admitted_transparent - model has been admitted and is fully transparent (nothing has been)

dependency_context - is computed by aggregating available IML models (when the models are available) of the dependencies and injecting it into the formalization context in addition to the model’s source code

Objective: our objective is to obtain the maximum possible number of models with the highest possible level of formalization while encoding the most current information (source code, human context, or dependency context).

Conditions requiring formalization update: Underlying source code for a model has changed or formalization has never been performed (unknown) Status of dependencies changed - one or more of the dependencies (including those indirect) have changed - either their status or source code

Formalization Workflow and Processing File Updates:

Initial State

Consider the following repository as an example - a simple math library with hierarchical structure where modules import symbols from other modules.

Project structure

The dependency graph inferred from imports is as follows:

Dependency graph

Arrow represents that the source node imports symbols from the target node, thus depends on the target node.

The formalization of a module can be expressed as:

Dependency graph

Starting from an initial state where no models are formalized, suppose we want to reason about functions in utils/helpers.py. To satisfy this request, CodeLogician must formalize utils/helpers.py and all its dependencies following topological order (bottom-up). The formalization sequence is:

Dependency graph

Upon completion, formalized models appear in green while unformalized modules remain in their original color:

Dependency graph

A summary of the formalization status:

Dependency graph

Minimal Re-formalization Strategy

When source code changes, CodeLogician computes precise diffs to determine which models need re-formalization, minimizing unnecessary work while maintaining consistency across the dependency graph.

Case 1: math_ops.py is modified but no dependency relation is changed.

Dependency graph

CodeLogician marks math_ops.py as SOURCE_MODIFIED (red) and its downstream dependency utils/helpers.py as DEPENDENCY_CHANGED (yellow). Note that utils/__init__.py and advanced/geometry/shapes.py, despite also being downstream of math_ops.py, remain unchanged because they were never formalized.

The re-formalization tasks execute in dependency order: first math_ops.py is re-formalized, then utils/helpers.py is updated using the existing basic.py model and the newly updated math_ops.py model.

This approach ensures only affected nodes are re-formalized while maintaining consistency between formalized models and source code.

Case 2: New dependency added

Suppose basic.py is refactored with some changes and functions extracted to a new file basic_.py. The dependency graph updates as follows:

Dependency graph

To maintain consistency with utils/helpers.py, CodeLogician executes formalization tasks in dependency order: basic_.pybasic.pymath_ops.pyutils/helpers.py.

Case 3: Dependencies merged

Suppose math_ops.py and basic.py are merged into a single file:

Dependency graph

Formalization tasks: math_ops.pyutils/helpers.py

Case 4: Upstream dependency removed

Suppose the imports of basic.py are removed from advanced/geometry/shapes.py:

Dependency graph

The previously formalized nodes (basic.py, math_ops.py, utils/helpers.py) maintain their status unchanged. CodeLogician recognizes that no re-formalization is needed since the dependency removal in an unformalized module doesn't affect the consistency of existing formal models.

Summary

When file updates occur, CodeLogician:

  1. Updates the dependency graph structure
  2. Calculates the minimal set of formalization tasks required
  3. Avoids unnecessary re-formalization by only targeting affected models

Error Resilience

If file changes introduce parsing errors (e.g., invalid imports), CodeLogician employs graceful degradation:

  • Keeps the filesystem representation current
  • Preserves the last working dependency graph and formal models
  • Tracks parsing errors for debugging
  • Enables recovery once dependency issues are resolved