- Apple has released a new open-source artificial intelligence model called “MGIE” that can edit images based on instructions.
- The model is able to handle various editing aspects such as Photoshop-style modification, global photo optimization, and local editing.
- MGIE is based on the use of powerful AI model MLLMs that can process both text and images to enhance instruction-based image editing.
Apple says it can edit images based on instructions.MGIEHe published a new open source artificial intelligence model called “. MGIE, which stands for “MLLM-Guided Image Editing,” uses multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations. The model is able to handle various editing aspects such as Photoshop-style modification, global photo optimization, and local editing. MGIE is the result of a collaboration between Apple and researchers from the University of California, Santa Barbara. The model was developed by the International Conference on Learning Representations (Conference on Learning Representations), one of the most important centers on artificial intelligence research.ICLR) was presented in a paper accepted in 2024. The paper demonstrates the effectiveness of MGIE in automatic measurements and evaluation improvements while maintaining its efficiency with competitive implications.
MGIE is based on the use of powerful AI model MLLMs that can process both text and images to enhance instruction-based image editing. MLLMs have demonstrated remarkable abilities in cross-model understanding and visual-aware response generation but have not been widely applied in image editing tasks.
MGIE integrates MLLMs into the image editing process in two ways: First, it uses MLLMs to derive meaningful instructions from user input. These instructions are short and clear and provide clear guidance for the editing process. For example, given the input “make the sky bluer”, MGIE can produce the instruction “increase sky saturation by 20%” in the image. Second, MLLMs are used to generate the visual imagination, which is a latent representation of the desired arrangement. This representation captures the essence of editing and can be used to guide manipulation at the pixel level. MGIE uses a novel end-to-end training scheme that jointly optimizes instruction derivation, visual imagination, and image editing modules.
MGIE can handle a wide range of editing scenarios, from simple color adjustments to complex object manipulations. The model can also make global and local adjustments depending on the user’s preference.
Some features of MGIE:
Impressive instruction-based editing: MGIE is able to produce concise and clear instructions that effectively guide the editing process. This not only improves the quality of edits but also improves the overall user experience.
Photoshop-like changes: MGIE can perform common Photoshop-style editing such as cropping, resizing, rotating, flipping and adding filters. The model can also apply more advanced edits, such as changing the background, adding or removing objects, and blending images.
Global photo optimization: MGIE can optimize the overall quality of a photo, such as brightness, contrast, sharpness and color balance. The model can also apply artistic effects such as sketching, painting and cartooning.
Local edit: MGIE can edit specific regions or objects in an image, such as faces, eyes, hair, clothing and accessories. The model can also change the attributes of these regions or objects, such as shape, size, color, texture and style.
MGIE is currently on GitHub where users can find code, data, and pre-trained models open source exists as a project. The project also offers a demo notebook feature that demonstrates how to use MGIE for various editing tasks. Users can also try MGIE online through a web demo hosted on Hugging Face Spaces, a platform for sharing and collaborating on machine learning (ML) projects.
MGIE was designed to be easy to use and have flexible customization options. With MGIE, users can provide natural language instructions to edit images, and MGIE can generate edited images with derived instructions. Users can also provide feedback to MGIE to improve edits in the application or request different editing options. MGIE can also be integrated with other applications or platforms that require image editing functionality.
MGIE is a breakthrough in instruction-based image editing, a challenging and important task for both artificial intelligence and human creativity. MGIE demonstrates the potential of using MLLMs to improve image editing and opens up new possibilities for cross-modal interaction and communication. While MGIE can help users create, modify and optimize images for personal or professional purposes such as social media, e-commerce, education, entertainment and art, MGIE can enable users to express their ideas and emotions through images and inspire them to explore their creativity. .
For Apple, MGIE is an initiative that also highlights the company’s growing prowess in artificial intelligence research and development. The consumer tech giant has rapidly expanded its machine learning capabilities in recent years, and MGIE is perhaps the most impressive demonstration of how AI can improve everyday creative tasks. Although MGIE represents a major breakthrough, experts say there is still much work to be done to develop multi-modal AI systems. However, advances in this field are accelerating rapidly. The excitement generated by the launch of MGIE may soon make this helpful AI application an indispensable assistant.
Compiled by: Burçin Bağatur