Quick Start

Development Approach

MaaFramework provides three integration solutions to meet different development scenarios:

Approach 1: Pure JSON Low-Code Programming

Applicable Scenarios: Quick start, simple logic implementation

Features:

Zero coding prerequisite
Automated processes configured through JSON
Comes with a 🎞️ video tutorial and a ⭐ project template
Combine drag-and-drop development with the MPE visual editor.

jsonc

{
    "Click Start Button": {
        "recognition": "OCR",          // Text recognition engine
        "expected": "Start",           // Target text
        "action": "Click",             // Execute click action
        "next": ["Click Confirm Icon"] // Subsequent task chain
    },
    "Click Confirm Icon": {
        "recognition": "TemplateMatch",// Image template matching
        "template": "confirm.png",     // Matching asset path
        "action": "Click"
    }
}

Approach 2: JSON + Custom Logic Extension (Recommended)

Features:

Retains the low-code advantage of JSON; core flows remain visual and easy to edit
Hosts custom recognition/actions in the Agent process, making it easier to encapsulate advanced logic
Seamlessly integrates with the ⭐ boilerplate to provide scaffolding and examples

jsonc

{
    "Click Confirm Icon": {
        "next": ["Custom Processing Module"]
    },
    "Custom Processing Module": {
        "recognition": "Custom",
        "custom_recognition": "MyReco",  // Custom recognizer ID
        "action": "Custom",
        "custom_action": "MyAct"         // Custom action ID
    }
}

💡 The General UI automatically connects to your Agent process and invokes the registered recognition/action implementations when executing MyReco/MyAct.

python

# Python pseudo-code example
from maa.agent.agent_server import AgentServer

# Register a custom recognizer
@AgentServer.custom_recognition("MyReco")
class CustomReco:
    def analyze(ctx):
        return (10, 10, 100, 100)  # Return your own processed recognition result

# Register a custom action
@AgentServer.custom_action("MyAct")
class CustomAction:
    def run(ctx):
        ctx.controller.post_click(100, 10).wait()  # Execute click
        ctx.override_next(["TaskA", "TaskB"])       # Dynamically adjust the task flow

# Start the Agent service
AgentServer.start_up(sock_id)

For a complete example, refer to the template commit.

Approach 3: Full-Code Development

NOTE

MaaFramework offers full multi-language APIs, but code-only workflows lose ecosystem tools (visual editor, visual debugger, General UI). In most cases, the custom extensions in Approach 2 already cover advanced requirements without sacrificing those capabilities.

Applicable Scenarios:

Deep customization requirements
Implementation of complex business logic
Need for flexible control over the execution flow

python

# Python pseudo-code example
def main():
    # Execute the predefined JSON task
    result = tasker.post_task("Click Start Button").wait().get()
    
    if result.completed:
        # Execute code-level operations
        tasker.controller.post_click(100, 100)
    else:
        # Get the current screenshot
        image = tasker.controller.cached_image
        # Register a custom action
        tasker.resource.register_custom_action("MyAction", MyAction())
        # Execute a mixed task chain
        tasker.post_task("Click Confirm Icon").wait()

Resource Preparation

After you confirm your development approach, prepare the corresponding resource files. The example below uses the project boilerplate as a baseline.

TIP

If you use the project boilerplate, follow the TIP marks below for a ready path.
If you choose full-code development, you still need resource files such as image assets and OCR models; otherwise related image-recognition features will be unavailable.

File Structure Specification

TIP

⭐If you use the boilerplate, modify the folder directly.

tree

my_resource/
├── image/                # Image asset library
│   ├── my_button_ok.png
│   └── my_icon_close.png
├── model/
│   └── ocr/              # Text recognition models
│       ├── det.onnx
│       ├── keys.txt
│       └── rec.onnx
└── pipeline/             # Task pipelines
    ├── my_main.json
    └── my_subflow.json

You can modify the names of files and folders starting with "my_", but the others have fixed file names and should not be changed. Here's a breakdown:

Task Pipeline

The files in my_resource/pipeline contain the main script execution logic and recursively read all JSON format files in the directory.

You can refer to the Task Pipeline Protocol for writing these files. You can find a simple demo for reference.

Recommended tools:

JSON Schema
VSCode Extension
- Config resources based on interface.json
- Support going to task definition, finding task references, renaming task, completing task, click to launch task
- Support launching as MaaPiCli
- Support screencap and crop image after connected
MaaPipelineEditor
- No-code visual editor; supports drag-and-drop nodes and JSON import/export

Image Files

The files in my_resource/image are primarily used for template matching images, feature detection images, and other images required by the pipeline. They are read based on the template and other fields specified in the pipeline.

Use lossless source images scaled to 720p before cropping. Unless you're very familiar with MaaFramework's processing, use the capture tools below to obtain images.

Text Recognition Model Files

⭐If you use the boilerplate, just follow its documentation and run configure.py to automatically deploy the model file.

The files in my_resource/model/ocr are ONNX models obtained from PaddleOCR after conversion.

You can use our pre-converted files: MaaCommonAssets. Choose the language you need and store them according to the directory structure above.

If needed, you can also fine-tune the official pre-trained models of PaddleOCR yourself (please refer to the official PaddleOCR documentation) and convert them to ONNX files for use. You can find conversion commands here.

Debug

After you finish preparing resources, you can start debugging.

NOTE

If you choose full-code development, some tools in this section may not work; consider writing your own debug helpers instead.

Development tools overview

Most tools will generate a config/maa_option.json file in the same directory, including:

logging: Save the log and generate debug/maa.log. Default true.
save_draw: Save visualized image-recognition results during runtime. Default false.
stdout_level: Console log level. Default 2 (Error); set 0 to silence logs, or 7 to show all logs.
save_on_error: Save the current screenshot when a task fails. Default true.
draw_quality: JPEG quality for visualized image-recognition results (0-100). Default 85.

If you integrate it yourself, you can enable debugging options through the Toolkit.init_option / MaaToolkitConfigInitOption interface. The generated json file is the same as above.

Run

NOTE

If you choose full-code development, the UI apps in this chapter may not work; consider writing your own interaction UI.

General UI overview

We define a ProjectInterface protocol to describe the resource files and runtime configuration so General UI can correctly load and run your project.

In short, write an interface.json to tell the General UI where your resources are and which tasks can be executed, so it can run for you.

Best-practice references:

Full-Code Development

Please refer to the Integration Documentation and the Integrated Interface Overview.

Communication

Developers are welcome to join the official QQ group (595990173) for integration and development discussions. The group is reserved for engineering topics; product-usage support is not provided, and off-topic or spam accounts may be removed to keep the channel focused.

Quick Start ​

Development Approach ​

Approach 1: Pure JSON Low-Code Programming ​

Approach 2: JSON + Custom Logic Extension (Recommended) ​

Approach 3: Full-Code Development ​

Resource Preparation ​

File Structure Specification ​

Task Pipeline ​

Image Files ​

Text Recognition Model Files ​

Debug ​

Run ​

Full-Code Development ​

Communication ​

Quick Start

Development Approach

Approach 1: Pure JSON Low-Code Programming

Approach 2: JSON + Custom Logic Extension (Recommended)

Approach 3: Full-Code Development

Resource Preparation

File Structure Specification

Task Pipeline

Image Files

Text Recognition Model Files

Debug

Run

Full-Code Development

Communication