Reporting Engine#

Reporting is a powerful tool for standardized report generation, enabling users to create consistent reports with minimal effort. Juice’s reporting engine is built on top of the Jinja templating engine, which allows for flexible and customizable report templates. By defining a few key functions and configuring the template engine, users can easily generate reports that include text, tables, and figures based on their aggregated data.

This tutorial shows how to build a custom report template on top of the Juice reporting templating engine.

Running the example Report#

To get started with reporting, it is possible to generate an empty report without any configuration by simply running the make_report function and passing "latest" as the run_id and "master_template_oqs_theme.md.jinja" as the template name. This will generate a report using the master template with no additional data or figures, with only the default layout and styling.

from orangeqs.juice.reporting import make

make.make_report(
    "latest", # latest run_id or specific run_id to generate the report for
    "master_template_oqs_theme.md.jinja", # template name
    output_dir="/home/user", # output directory for the generated report and assets
)

Concepts#

Juice reporting uses Jinja to render markdown reports. If you need advanced details on macros, filters, and block inheritance, refer to the Jinja documentation.

The reporting flow is configured via TemplateEngineConfig, which points to dotted-path (ex.: my.repo.my_function) Python functions and optional template/filter extensions:

Configure the template engine#

To make your custom templates discoverable by the Reporting Engine, configure template-engine.toml in your lab package (path seen below). Use dotted paths to point to your functions and absolute paths for extra template directories.

~/shared/lib/lab/src/lab/config/template-engine.toml

generate_report_data="lab.reporting.generate_report_data"
get_aggregate_input_data="lab.reporting.get_aggregate_input_data"
extra_template_dirs=["~/shared/lib/lab/src/lab/templates"] # add new templates in this directory

Structure data processing functions#

The reporting engine relies on two user-defined functions to process data and generate the final report. Aggregate input data function is responsible for fetching and structuring raw data, while the generate report data function transforms this structured data into a format suitable for rendering in the Jinja template. Both functions return Pydantic BaseModel instances, allowing for flexible and well-defined data structures.

Aggregate input data#

Gathering data from various sources and transforming it into a structured format is a common requirement for generating reports. The get_aggregate_input_data function serves this purpose by fetching raw data, performing necessary transformations, and returning it in a structured format that can be easily consumed by the report generation function.

The function needs to have a the following generic signature: def get_aggregate_input_data(mode: Literal["mock", "live"], run_id: str, **kwargs) -> BaseModel:. Here, mode indicates whether the function is being run with mock data or live data and run_id can be used to fetch data specific to a particular run or experiment. The **kwargs allows for additional parameters to be passed as needed. Finally, the output BaseModel can be any custom class that inherits from Pydantic’s BaseModel and defines the structure of the aggregated data.

from typing import Any, Literal
from pydantic import BaseModel, ConfigDict
import pandas as pd
from orangeqs.juice.telegraf.schemas import PodmanStatsEvent
from orangeqs.juice.client.influxdb2 import influxdb2_query_api

class Model(BaseModel):
    """Example model for aggregated input data. You can customize this model to include any fields relevant to your report generation needs.
    """
    text_data: str = "Hello Reporting Engine!"
    mode: str
    memory_usage_data: pd.DataFrame
    run_id: str

    model_config = ConfigDict(arbitrary_types_allowed=True)

def get_aggregate_input_data(mode: Literal["mock", "live"], run_id: str, **kwargs: Any) -> BaseModel:
    # In this example, we query the InfluxDB for PodmanStatsEvent data from the last day and return it as a DataFrame in the Model. You can replace this with any data fetching and transformation logic relevant to your use case.
    api = influxdb2_query_api()
    df = PodmanStatsEvent.query(api, 'system_logs', start="-1d", output='dataframe')
    return Model(mode=mode, run_id=run_id, memory_usage_data=df)

Generate report data function#

After the engine executes the data aggregation function, the resulting structured data is passed to the report generation function with an additional output_dir argument that provides paths where figures should be saved. The user implements the logic to transform the aggregated data into a format that can be rendered by the jinja template which ranges from simple text fields to tables and figures. It is the user’s responsibility to save the figures to disk using the provided output directory and to return the correct relative paths in the output BaseModel. As a rule of thumb, use output_dir.figs_dir to save your figures and output_dir.figs_dir_relative to construct the relative path for the template.

The generate data function requires the following signature: def generate_report_data(aggregate_input_data: BaseModel, output_dir: OutputDirectories, **kwargs) -> BaseModel:. The aggregate_input_data is the returned object of the get_aggregate_input_data function, and output_dir is an instance of OutputDirectories that contains absolute and relative paths, where figures and other assets needs to be saved. The returned object can be any custom class that inherits from Pydantic’s BaseModel and defines the structure of the data to be used in the template rendering.

Make sure to include any figures you want to render in the template in the output BaseModel and use the provided macros in the template to visualize them.

from typing import Any

from orangeqs.juice.reporting.data.schemas import OutputDirectories, Figure, Figures
import os
import matplotlib.pyplot as plt
import pandas as pd
from pydantic import BaseModel

class ReportModel(BaseModel):
    """Example model for report data. This model includes a text field and a dictionary of figures that can be rendered in the template. You can customize this model to include any fields relevant to your report generation needs, such as tables, additional text fields, or different types of figures."""
    text_field: str
    figures: Figures


def generate_report_data(aggregate_input_data: BaseModel, output_dir: OutputDirectories, **kwargs: Any) -> ReportModel:
    """Example function that generates report data from the aggregated input data. In this example, we create a figure from the system logs data and add it to the figures dictionary in the ReportModel. You can customize this function to generate any type of content and figures based on your aggregated data."""

    figures = Figures(
        logo_tagline=Figure(
            url=os.path.join(
                output_dir.theme_figs_dir_relative, "OrangeQS-logo_tagline.svg"
            ),
            caption="",
        ),
        placeholder_figure=Figure(
            url=os.path.join(
                output_dir.theme_figs_dir_relative, "placeholder_image.svg"
            ),
            caption="Error generating figure",
        ),
    )

    name, figure = memory_usage_figure = system_logs(
        aggregate_input_data.memory_usage_data, output_dir
    )
    figures.add(name, figure)
    return ReportModel(text_field=aggregate_input_data.text_data, figures=figures)

def system_logs(system_logs: pd.DataFrame, output_dir) -> Figure:
    """Example function that generates a matplotlib figure from the system logs data and saves it to the output directory. The function returns a Figure object with the relative path to the saved figure and a caption."""
    system_logs['time'] = pd.to_datetime(system_logs['time'])
    system_logs = system_logs.sort_values('time')
    fig, ax = plt.subplots()
    for name, group in system_logs.groupby('name'):
        ax.plot(group['time'], group['mem_percent'], label=name)
    figure_name = "system-memory-usage-over-time"
    caption = "System Memory Usage Percentage broken down per container basis"
    plt.xticks(rotation=45)
    plt.xlabel('Time')
    plt.ylabel('Memory Usage %')
    plt.title('Memory Usage Over Time by Name')
    plt.legend()
    file_name = f"{figure_name}.svg"
    fig.savefig(os.path.join(output_dir.figs_dir, file_name), bbox_inches="tight")  # type: ignore
    plt.close(fig)
    figure_object = Figure(
        url=os.path.join(output_dir.figs_dir_relative, file_name), caption=caption
    )
    return figure_name, figure_object

Create custom templates#

Since we added ~/shared/lib/lab/src/lab/templates as an extra template directory in template-engine.toml, we can place our custom templates in that directory and they will be discoverable by the engine.

Start by extending the master reporting template by placing {% extends "master_template_oqs_theme.md.jinja" %} at the top of your custom template and overriding the content block using {% block content %} and {% endblock %}.

Next we will be adding content to the template, using the output of the generate_report_data function. You can use any of the fields in the output BaseModel as variables in the template. For example, if you have a text field called text_field, you can render it in the template with {{ text_field }}. This also applies to nested fields (ex.: text.field.subfield). For figures, use the provided macros to render them. First import the macros with {% import 'macros.md.jinja' as macros %}, then use the show_mpl_figure macro to render matplotlib figures with {{ macros.show_mpl_figure(figures, 'figure_name') }}, where figures is the field in the output BaseModel that contains the figures dictionary and figure_name is the key of the figure you want to render.

Here is the full example of a custom template that uses the output of the generate_report_data function and renders a text field and a figure:

~/shared/lib/lab/src/lab/templates/custom_template.md.jinja

{% extends "master_template_oqs_theme.md.jinja" %}

{% block content %}
# Welcome to Juice Reporting Engine!

To get started, please refer to our official documentation, where you will find detailed tutorials on creating and managing custom templates: https://docs.orangeqs.com/juice/core.

Add an example text field with `{% raw %}{{ text_field }}{% endraw %}`:

`{{ text_field }}`

Next we add a figure using the following macro:

* Import the macros `{% raw %}{% import 'macros.md.jinja' as macros %}{% endraw %}`

{% import 'macros.md.jinja' as macros %}

* Plot the figure we have created before using `{% raw %}{{ macros.show_mpl_figure(figures, 'system-memory-usage-over-time') }}{% endraw %}`

{{ macros.show_mpl_figure(figures, 'system-memory-usage-over-time') }}
{% endblock %}

Run the report from Python#

After setting up the custom reporting functions and placing custom_template.md.jinja in your lab package and configuring template-engine.toml, generate the report within your own python kernel using the make_report function from the reporting module. You can specify the template name and the run_id for which you want to generate the report. If you want to generate a report for the latest run, simply pass “latest” as the run_id.:

from orangeqs.juice.reporting import make

make.make_report(
    "latest", # latest run_id or specific run_id to generate the report for
    "custom_template.md.jinja", # template name
    output_dir="/home/user", # output directory for the generated report and assets
)

Finally download all generated assets - figures, theme and bokeh_figs folders - and open the markdown or HTML report to see the rendered output.

Useful arguments:

  • mock_data=True to run the pipeline with mock inputs.

  • report_name="my_report" to change output filename.

  • output_dir=... to control where markdown, HTML, and assets are written.

  • kwargs to pass additional arguments to the user-defined functions.

Advanced customization#

For custom macro implementation, advanced block patterns, or Jinja filter authoring details, use the upstream Jinja documentation as the source of truth:

CLI Usage#

To use reporting from the command line use the following command:

python -m orangeqs.juice.reporting.make --template master_template_oqs_theme.md.jinja

Below see all available CLI arguments:

python -m orangeqs.juice.reporting.make [-h] [--run-id RUN_ID] [--template TEMPLATE] [--mock-data] [--report-name REPORT_NAME] [--output-dir OUTPUT_DIR]

Troubleshooting#

If you encounter issues with template rendering, check the following:

  • Ensure that your template-engine.toml is correctly configured with the right dotted paths and template directories.

  • Verify that your data processing functions (get_aggregate_input_data and generate_report_data) are returning the expected data structures and that any figures are being saved to the correct output directory.