What is Apple's MGIE? An introduction to how the image editing AI model works and what it can do!

in #technology3 months ago

AI.jpeg
MGIE is an image editing AI model announced by Apple on February 8, 2024. It can perform a wide range of editing tasks, such as changing size, contrast, etc., and cropping, by means of text prompts (instructions). and utilization of the latest model of MGIE have been attracting attention.

In this article, we will report on the mechanism of Apple's MGIE, what it can do, and points to keep in mind when using it. We hope you will find it useful as you introduce and operate the Generated AI service.

What is “MGIE,” Apple's latest AI image editing technology?
MGIE” is an acronym for ‘MLLM-Guided Image Editing,’ which in Japanese means ‘multimodal guided image editing,’ a cutting-edge image editing AI model developed by Apple in collaboration with the University of California, Santa Barbara. Through natural language prompts, it can perform many editing functions such as changing image brightness and saturation, cropping, and resizing.

MGIE employs a multimodal large-scale language model (MLLM). The AI accurately interprets the text content entered by the user and performs highly accurate editing at the pixel level. Users can instruct MGIE to perform a wide range of tasks, from simple image editing to complex editing tasks, using text, without the need for specialized knowledge.

MGIE is open source and the code is currently available for download on GitHub. You can also try out the demo online at Hugging Face.

What is a multimodal large-scale language model (MLLM)?
A multimodal large-scale language model (MLLM) is an AI model that can process multiple types of information, including text, images, and video. Combining visual information such as videos and images with textual information in natural language, MLLMs can handle a variety of real-world tasks and operations.

A typical task for MLLM is image capturing, which generates appropriate descriptions (captions) for images, a technique in which an AI model understands the content of an image and generates a description in natural language based on the information.

Other MLLMs from Apple include Ferret.

How MGIE Works
Let's look at how MGIE handles tasks: MGIE uses two different MLLMs and learns to interpret text prompts entered by the user. It then imagines how the image should be edited and recognizes where it wants to make changes.

The specific flow from entering text instructions to editing an image in MGIE is as follows.

The user inputs instructions (prompts) to edit an image in natural language
MGIE analyzes the user's intent and understands the request or change
Understanding of the prompts and analysis of the image. Understand components to recognize their interrelationships.
Combines linguistic understanding with understanding of visual information to produce the most natural result based on the prompt
MGIE recognizes elements such as objects and colors and their interrelationships. In addition, it can output the most natural image possible, taking into account the context of the image during the editing phase, known as guided editing.

What MGIE can do
MGIE offers many image editing options. This section describes the main tasks that can be performed in MGIE.

Flexible editing with text instructions
MGIE allows image editing with simple text instructions. For example, you can enter specific requests such as “change the color of the flowers in this image to red” or “remove the table from this image.
It has also been confirmed that it can handle abstract instructions to some extent, such as “Please make the food on the plate in this image healthier. A request to “change this image to a picture of rain” can be changed from sunny to rainy in a short period of time. The system can smoothly recognize the content of natural language and easily generate images.

Photoshop Style Modification
MGIE can perform Photoshop style modification functions. It efficiently performs basic image editing such as cropping, rotating, flipping, and adding filters. It also supports more advanced and complex editing tasks such as changing backgrounds, adding objects, and merging images.

With MGIE, you will be able to create the images you desire without the need to work with specialized software.

Optimization of Photo Quality
MGIE allows you to improve the overall quality of your images. This includes basic adjustments such as image brightness, contrast, saturation, and color balance. For example, if you want to change the brightness of an image, a simple instruction such as “make it brighter” can accomplish this.
It also supports changing styles such as sketch style, painting style, and animated style. You may even apply more artistic effects with just a simple prompt.

Local and Global Adjustments
In addition to adjustments to the entire image, MGIE is also unique in that it allows users to make partial changes and add area-specific adjustments. This allows you to make fine adjustments to objects within a specific area, thus achieving the desired result more efficiently.

For example, it is possible to specify elements within an image, such as a person's face, eyes, hair, or accessories, and make limited edits.

How to use MGIE
Apple's MGIE is open source and available for all users to download and incorporate into their own tools. The source code is available on GitHub, but if you do not know how to integrate or operate it, we recommend using the demo available online.

The demo at Hugging Face Spaces allows you to experience MGIE's functionality firsthand: visit the website, upload the image you want to edit, and enter the prompts.

The MGIE source code is currently available on GitHub and can be downloaded for free; by obtaining and configuring the necessary code on GitHub, the MGIE is accessible to all interested users.

Points to note when using MGIE
Because MGIE is open source, it can be used without contracts or payment of usage fees, but there are some caveats. Here are some points to keep in mind when using MGIE.

CC-BY-NC license does not allow commercial use
Since MGIE is released under the “CC-BY-NC License,” please note that images generated by MGIE cannot be used for commercial purposes. Assistant), which uses the Large Language Model (LLM).

Therefore, use of MGIE must be in accordance with the LLaVA license and is prohibited for commercial purposes. Use for personal projects or research purposes is not a problem, but since LLaVA training was developed using several AI models such as LLaMA and GPT-4, it must be used in accordance with their respective terms and conditions.

Future Prospects for MGIE
In order to make MGIE's image editing capabilities easier and more accessible, Apple intends to develop it with the goal of continually improving it and expanding its range of applications. The company has stated its focus on the field of generative AI, and in February 2024, the company emphasized its commitment to AI development.

In the future, the company plans to establish an environment that can be applied to editing video as well as images, and says it also plans to develop technology that can produce video using only text instructions. Apple's research and development will continue with the aim of shortening the distance between human wishes and intentions and image editing technology, and to create a system that allows more people to freely edit images.

Conclusion
MGIE is an open source, state-of-the-art image editing AI model released by Apple Inc. MGIE not only understands natural language, but also recognizes image content and context before responding to instructions MGIE can be used to automatically generate images from text-based instructions. In addition, optimization and partial adjustment of image brightness and contrast can be easily completed.

Note, however, that due to licensing restrictions, the generated images cannot be used for commercial purposes; a demo version of MGIE is available at Hugging Face Space, so why not give it a try?

You can request a list of companies that provide generative AI services below. We hope you will use this information to compare and contrast the services that best fit your company's challenges and objectives for AI use.

End! Thank you!

Coin Marketplace

STEEM 0.17
TRX 0.13
JST 0.028
BTC 59453.81
ETH 2607.50
USDT 1.00
SBD 2.39