[入門]LangChainでOpenAI API（DALL-E）を使った画像生成

こんにちはCTOのsakasaiです。

ディーメイクではSlackやMattermostで使えるカスタム絵文字の無料配布を毎月行っています。

https://www.d-make.co.jp/blog/category/custom-emoji-download/

ここで配布している絵文字は、社員が毎月新しい絵文字を考えて作成し、それを集めたものを配布しています。

というわけで、そんな絵文字を自動でいくつか作れたら便利だなと思いつつ、いま自動でなにかを作ると言ったら生成AIですよねってことで、今回はOpenAIのDALL-Eを使って生成してみました。

2023-11-7に発表されたOpenAI DevDayのアップデート（当日のリリース）により内容が古くなっています。
→ 2023-11-09 ライブラリのインストール（pip install）を動作するバージョンに固定しました

自動化できるようにするためにChatGPT上での生成ではなく、APIを使って生成します。

実装を簡略化するのと、DALL-Eだけではなく他の生成モデルへの展開もできるように実装にはLangChainを利用しました。

また、LangChainの処理を可視化するためにLangSmithを使用します。

（DALL-E、LangChain、LangSmith等の詳しい解説は省略します）

今回の内容には、事前登録、有償機能への登録、また実行に当たり料金が発生します。
実施する場合は自己責任でお願いします。（一切の保証はいたしかねます）

実装

一旦試しということもあり、GoogleClab上で実施しました。

チュートリアル

以下を参考に実施しました。

https://python.langchain.com/docs/integrations/tools/dalle_image_generator

まずはそのままやってみます。（合わせて内容をかんたんに解説します）

必要なライブラリのインストール

LangChainのQuickstartや上記ページを参考に行います。


!pip install langchain==0.0.332
!pip install openai==0.28
!pip install opencv-python scikit-image

（Colab（Notebook）上でライブラリのインストールを行っているので先頭に!がついています。）

いくつか警告が出ますがSuccessが表示されれば正常終了しています。

環境変数の設定


import os

os.environ["OPENAI_API_KEY"] = "sk-xxxxxxxxxx" # Your OpenAI API key

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "ls__xxxxxxxxxx" # Your LangChain API key
os.environ["LANGCHAIN_PROJECT"] = "default"

参考ページではOpenAIのAPI Keyを設定してますが、追加でLangSmithの環境変数も設定します。

上記ができたら生成を定義する部分を実装していきます。

必要なライブラリをimport


from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.utilities.dalle_image_generator import DallEAPIWrapper
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

LLMの定義


llm = OpenAI(temperature=0.9)

定義にはOpenAIクラスを使っています。

この定義ではLLMのモデルがデフォルトの text-davinci-003 になります。

プロンプトの定義


prompt = PromptTemplate(
    input_variables=["image_desc"],
    template="Generate a detailed prompt to generate an image based on the following description: {image_desc}",
)

PromptTemplateを使ってユーザーからの入力を表す変数とテンプレートを定義しています。

テンプレート部分を翻訳すると

次の説明に基づいてイメージを生成するための詳細なプロンプトを生成します: {image_desc}

となっており、ユーザーの入力をもとに画像生成のためのプロンプトの生成を行う指示になっています。

Chainの生成


chain = LLMChain(llm=llm, prompt=prompt)

LLMChainで定義したLLMとプロンプトを使ってChainを作ります。

画像生成


image_url = DallEAPIWrapper(n=1, size="512x512").run(chain.run("halloween night at a haunted museum"))

画像生成を実行します。

DallEAPIWrapperで生成する画像の数とサイズの定義を行います。参考ページでは何も設定していませんが一旦上記の設定にしました。

runの中でChainを実行しそこにユーザー入力を与えます。

ユーザー入力を翻訳すると

幽霊の出る博物館でのハロウィーンの夜

となっています。

テンプレートと合わせると、

次の説明に基づいてイメージを生成するための詳細なプロンプトを生成します: 幽霊の出る博物館でのハロウィーンの夜

となります。

処理としては

ユーザー入力を元に画像生成のためのプロンプト（テキスト）をLLMで生成
生成したプロンプトをDALL-Eに渡して画像生成

のようです。

この画像生成にはOpenAIの画像生成APIが使われています。

利用しているモデルは下記とユーザーページのRate limitsを見たところDALL-E 2のようです。

https://platform.openai.com/docs/models/dall-e

生成した画像の可視化


import google.colab
from google.colab.patches import cv2_imshow  # for image display
from skimage import io

image = io.imread(image_url)
cv2_imshow(image)

Colabでは上記の実装で画像が表示されます。（参考ページの実装から抜粋しています）

以下が生成されました。

LangSmithでこの画像生成のためプロントを見てみると以下になっていました。

[翻訳]
ハロウィーンの夜、あなたは軋む古い博物館をさまよっていることに気づきました。この場所は人けのない場所だが、長い間忘れ去られた時代の不気味な遺物で満たされている。薄暗い廊下に不気味な夜の音が響く。博物館内を進んでいくと、まるで何か不吉なものが影に潜んでいて、今にも襲いかかってきそうな空気の冷たさを感じ始めます。美術館の中央には、天井から巨大なシャンデリアが吊り下げられた大きな部屋があります。影の中に、遠い昔に亡くなった住民の幽霊のような姿が見え、彼らの目は暗闇の中で輝いています。出口に向かう途中、角に今まで気付かなかった奇妙なディスプレイがあることに気づきました。それは奇妙な小さな装飾品や骨董品でいっぱいのドールハウスです。あなたはすぐに、ドールハウスが恐怖とサスペンスのハロウィーンの夜への入り口であることに気づきます。
[原文]
It's Halloween night and you find yourself wandering through a creaky old museum. The place is deserted but filled with macabre artifacts of long forgotten eras. The dimly lit corridors echo with the spooky sounds of the night. As you make your way through the museum, you start to feel a chill in the air as if something sinister is lurking in the shadows, ready to pounce at any moment. In the center of the museum is a large room with a massive chandelier hanging from the ceiling. You can see in the shadows the ghostly figures of long dead inhabitants, their eyes glowing in the darkness. As you make your way to the exit, you notice a strange display in the corner that you had not noticed before. It's a dollhouse, filled to the brim with odd little trinkets and curios. You soon realize that the dollhouse is the portal to a halloween night of terror and suspense.

画像生成用のプロンプトか謎な部分がありますが、一旦こんな感じにありました。

生成された画像も言われてみればそれっぽいでしょうか。。

絵文字画像生成

それでは本題の絵文字画像を生成してみます。

ユーザー入力に以下を与えて実行してみます。


image_url = DallEAPIWrapper(n=1, size="512x512").run(chain.run("今月の絵文字"))

text-davinci-003

以下が生成されました。

与えられたプロンプトは

[翻訳]

夜空の色とりどりの星や惑星に囲まれ、蝶ネクタイとシルクハットをかぶって両手を広げて微笑む太陽の絵を描きましょう。

青い空と白い雲が広がる明るく晴れた日の絵を描きましょう。空の中心には、黒い蝶ネクタイと背の高いシルクハットをかぶり、腕を広げて微笑む楽しそうな太陽があります。太陽の笑顔から黄色い光が放射されています。太陽の周りには、さまざまな大きさや色のカラフルな星や惑星があり、夜空に輝きと輝きを加えています。
[原文]
Paint a picture of a smiling sun with open arms, wearing a bow tie and a top hat, surrounded by colorful stars and planets in the night sky.
Paint a picture of a bright, sunny day filled with a blue sky and puffy white clouds. In the center of the sky is a smiling, joyful sun with open arms wearing a black bow tie and a tall top hat. Rays of yellow light are emanating from the sun's smile. Around the sun are colorful stars and planets of varying sizes and colors, adding a sparkle and twinkle to the night sky.

試しにGPT-3.5-turbo、GPT-4でもプロンプトを生成してみます。

GPT-3.5-turbo

LLMの定義を以下のように変更します。


llm = ChatOpenAI(model_name="gpt-3.5-turbo")

ソースコードを見るとOpenAI（BaseOpenAI）クラスはgpt-3.5-turbo等をサポートしていないようなので、ChatOpenAIクラスで定義します。

またLLMを変えたせいか、生成するプロンプトが画像生成APIに渡せる文字数制限を超えてしまうことがあったので、プロンプトテンプレートに文字数制限を入れてみました。


prompt = PromptTemplate(
    input_variables=["image_desc"],
    template="Generates a detailed prompt in 500 characters to generate an image based on the following description: {image_desc}",
)

生成された画像

プロンプト

描かれる絵文字は、鮮やかな色の背景の上に、一瞬の感情を表現した抽象的なシルエットが浮かびます。背景は深い青で、強いエネルギーを感じさせます。シルエットは、極端なミニマリズムで描かれ、曲線や角度の絶妙なバランスがあります。この絵文字は、一瞬の瞬間の感情を表現するため、状況や感情の種類に応じて、さまざまなバリエーションが存在します。例えば、愛情を表現する場合には、2つの抽象的な形状が互いに重なり合い、柔らかなピンク色で描かれます。また、喜びや驚きを表現する場合には、鮮やかな黄色で描かれ、曲線的な形状が広がるように描かれます。この絵文字は、ユーザーの感情や状況に共感し、一瞬で感情を伝えるため、SNSやメッセージングアプリで幅広く使用されます。

プロンプトが日本語で生成されていました。

GPT-4

LLM定義をmodel_name="gpt-4"に変えて実行します。

生成された画像

プロンプト

[翻訳]
「今月の絵文字」のコンセプトを表す画像を作成します。画像には、現在の月が表示された大きな様式化されたカレンダーが壁に掛けられている必要があります。日付には、当月に人気のある、またはトレンドになっているさまざまな絵文字を使用してマークする必要があります。片面には、さまざまな年齢や背景を持つ人々が、驚き、喜び、困惑の表情を浮かべてカレンダーを見ている画像を含めます。画像の全体的なトーンは遊び心とカラフルである必要があります。
[原文]
Create an image that represents the concept of 'This Month's Emoji'. The image should feature a large, stylized calendar hanging on a wall with the current month displayed. The dates should be marked with various emojis that are popular or trending in the current month. On one side, include images of people of various ages and backgrounds, looking at the calendar with expressions of surprise, joy, or confusion. The overall tone of the image should be playful and colorful.

画像生成を指示するためのプロンプトは、やっぱりGPT-4が一番いいように見えます。

画像生成自体は上に張ったものだとどれも同じような感じもしますが、何回かやってみたところ安定してそれっぽい画像ができるのはプロンプトの精度が高いもののような感じがしました。

ChatGPTで実施

というわけで一旦APIを使った画像生成ができそうなことはわかったのですが、試しに同じことをChatGPT上でやってみようと思います。

GPT-4で画像生成用プロンプトを生成（画像生成用プロンプトを生成するためのプロンプトは同じものを使用）
DALL-E 3で画像生成

を行ってみます。

まずはGPT-4で画像生成用プロンプトの生成します。

生成されたプロンプト

Certainly! When you are asking to generate an image based on “今月の絵文字,” it translates to “this month’s emoji.” Assuming you’re looking for an image that encapsulates the concept of an emoji representing the current month, here’s a detailed prompt that could be used to create such an image:

Prompt for Image Generation:

Create a digital illustration of a unique emoji that represents the essence of this month. The emoji should reflect the current season’s characteristics and any major events or holidays that are typically celebrated during this month.

Key elements to include:

Seasonal Attributes: Depending on the hemisphere, incorporate elements like blooming flowers and a bright sun for spring or northern hemisphere; for the southern hemisphere, consider autumn leaves and a harvest moon.

Color Palette: Use a color scheme that corresponds with the current month’s seasonal tones—pastels for spring, rich oranges and browns for autumn, etc.

Facial Expression: The emoji should have a joyful and festive expression, embodying the mood of celebration and seasonal change.

Holiday Symbols: If there are holidays this month, subtly integrate symbols associated with them, such as a small shamrock for St. Patrick’s Day, a heart for Valentine’s Day, or fireworks for national holidays.

Interactive Element: Add an interactive element that brings the emoji to life, such as animated petals falling for spring or leaves blowing in the wind for autumn.

Cultural Relevance: Ensure that the design is inclusive and can resonate with a global audience, possibly by blending elements from different cultures’ celebrations of the month.

Stylistic Notes:

The emoji should be in a clear, high-resolution circular format suitable for digital use.

It should have a playful and engaging look, with exaggerated features for expressiveness.

Shadows and highlights should be used judiciously to give a three-dimensional effect, making the emoji pop.

Technical Specifications:

The final image should be a PNG file with a transparent background.

The dimensions should be 512x512 pixels to ensure it is detailed enough for different uses.

Maintain a good contrast between the elements and the background to ensure visibility on various devices.

This detailed prompt can help guide the creation of an image that is both representative of the specific time and engaging as an emoji for digital communication.

[こちらを生成するためのプロンプト]
Generate a detailed prompt to generate an image based on the following description: 今月の絵文字

翻訳して画像生成指示部分を抜粋すると以下になります。

今月のエッセンスを表すユニークな絵文字のデジタルイラストを作成します。絵文字は、現在の季節の特徴と、通常この月に祝われる主要なイベントや祝日を反映している必要があります。

含めるべき重要な要素:

季節の属性: 半球に応じて、春または北半球には咲く花や明るい太陽などの要素を組み込みます。南半球の場合は、紅葉と中秋の名月を考えてみましょう。

カラーパレット: 春にはパステル、秋には濃いオレンジとブラウンなど、その月の季節の色調に対応する配色を使用します。

顔の表情: 絵文字には、お祝いの気分や季節の変化を体現する、楽しくてお祭り的な表情が必要です。

休日のシンボル: 今月に休日がある場合は、聖パトリックの日には小さなシャムロック、バレンタインデーにはハート、祝日には花火など、それに関連するシンボルを微妙に組み込みます。

インタラクティブな要素: 春のアニメーションの花びらや秋の風になびく葉など、絵文字に命を吹き込むインタラクティブな要素を追加します。

文化的関連性: 月ごとに行われるさまざまな文化のお祝いの要素をブレンドするなど、デザインが包括的であり、世界中の視聴者の共感を呼ぶものであることを確認します。

文体上のメモ: 絵文字は、デジタルでの使用に適した、明確で高解像度の円形形式である必要があります。表現力を高めるために誇張された特徴を備えた、遊び心と魅力的な外観を持つ必要があります。シャドウとハイライトを慎重に使用して立体感を与え、絵文字をポップにする必要があります。

技術仕様: 最終的な画像は、背景が透明な PNG ファイルである必要があります。さまざまな用途に十分な詳細を確保するには、寸法を 512x512 ピクセルにする必要があります。要素と背景のコントラストを適切に維持して、さまざまなデバイスでの視認性を確保します。

原文（英語）のままのプロンプトを使ってDALL-E 3で画像を生成します。

生成された画像

絵文字として使えそうかは置いといて、画像自体はやっぱりDALL-E 3が全然良さそうですね。

画像生成のプロンプト自体を生成させずに、対話で工夫しながら生成すると絵文字で使えるような画像も生成できそうです。

試しに続けてみました。

なんとなく使えそうな感じになりました。

さいごに

LangChainでOpenAI APIを使った画像生成を行ってみました。

実装自体はとても簡単にできることがわかったと思います。

諸々考慮しないといけないこともありそうですが、ある程度画像生成を自動で行うことはできそうです。

ケースによると思いますが、実用を考えるとDALL-E 3が使えるようになるのが待ち遠しいですね。

今後、無料配布しているカスタム絵文字にAIで生成した画像が入ってくることがあるかもしれません。

今回使ったソースコード（GoogleColabファイル）はGitHubにも公開しています。

https://github.com/msakasai/google-colab/blob/main/langchain_image_generation.ipynb

おまけ

今回の画像生成を試すに当たりかかった料金（OpenAI API）