파인콘 벡터DB을 사용하여 멀티모달 리트리버 구현하기

anpigon (71)in #kr • 2 years ago

테스트에 사용할 스테이블 디퓨전 데이터 가져오기

200만개의 이미지가 포함된 DiffusionDB 데이터셋을 사용합니다. 이 데이터셋에서 2m_first_1k 데이터만 다운로드 합니다.

from datasets import load_dataset

dataset = load_dataset(
    path="poloclub/diffusiondb", 
    name="2m_first_1k", 
    split='train',
)
print(len(dataset))

파인콘 인덱스 생성

파인콘DB에 인덱스를 생성합니다.

from pinecone import Pinecone, ServerlessSpec

# 파인콘 클라이언트
pc = Pinecone()

# 파인콘 인덱스 생성
index_name = "llm-multimodal"
pc.create_index(
    name=index_name,
    dimension=512,
    metric="cosine",
    spec=ServerlessSpec("aws", "us-east-1"),
)
index = pc.Index(index_name)

임베딩 모델 불러오기

이미지 임베딩을 위해서 openai/clip-vit-base-patch32 임베딩 모델을 사용합니다. CLIP 모델은 이미지와 텍스트 간의 유사성을 학습한 모델입니다.

이미지 데이터셋에서 프롬프트 텍스트를 임베딩 벡터로 변환합니다.

import torch
from tqdm.auto import trange
from transformers import AutoTokenizer, CLIPTextModelWithProjection

text_model = CLIPTextModelWithProjection.from_pretrained("openai/clip-vit-base-patch32")
tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-base-patch32")

tokens = tokenizer(dataset["prompt"], padding=True, return_tensors="pt", truncation=True)

batch_size = 16
text_embeddings = []
for start_idx in trange(0, len(dataset), batch_size):
    with torch.no_grad():
        outputs = text_model(
            input_ids=tokens["input_ids"][start_idx : start_idx + batch_size],
            attention_mask=tokens["attention_mask"][start_idx : start_idx + batch_size],
        )
        text_emb_tmp = outputs.text_embeds
    text_embeddings.append(text_emb_tmp)
text_embeddings = torch.cat(text_embeddings, dim=0)
text_embeddings.shape  # (1000, 512)

텍스트 임베딩 벡터 데이터를 파이콘 인덱스에 저장

프롬프트 텍스트 임베딩 벡터 데이터를 파인콘DB에 저장합니다.

# 파인콘에 저장할 벡터 데이터 작성
input_data = []
for index, embedding, prompt in zip(
    range(0, len(dataset)), text_embeddings.tolist(), dataset["prompt"]
):
    input_data.append(
        {
            "id": str(index),
            "values": embedding,
            "metadata": {"text": prompt},
        }
    )

# 파인콘DB에 저장
index.upsert(vectors=input_data)

텍스트 임베딩으로 검색하기

프롬프트 임베딩을 활용해 검색합니다. 유사한 프롬프트를 찾아 순서대로 보여주며, 가장 일치도가 높은 프롬프트가 상단에 표시됩니다.

search_query_vector = text_embeddings.tolist()[882]
search_results = index.query(
    vector=search_query_vector,
    top_k=3,
    include_values=False,
    include_metadata=True,
)
search_results

{'matches': [{'id': '882', 'metadata': {'prompt': 'a beautiful fashion blond woman, ' "playerunknown's battlegrounds concept, " 'revealing outfit, symmetrical, ' 'maximalist, lily frame, art by ilya ' 'kuvshinov, rossdraws, sharp focus, art ' 'by wlop and artgerm, extreme detail, ' 'detailed drawing, hyper detailed face '}, 'score': 1.00090718, 'values': []}, {'id': '976', 'metadata': {'prompt': 'a beautiful military fashion blond ' "woman, playerunknown's battlegrounds " 'concept, revealing outfit, symmetrical, ' 'maximalist, lily frame, art by ilya ' 'kuvshinov, rossdraws, sharp focus, art ' 'by wlop and artgerm, extreme detail, ' 'detailed drawing, hyper detailed face '}, 'score': 0.982770383, 'values': []}, {'id': '457', 'metadata': {'prompt': 'indonesian princess, art by artgerm and ' 'greg rutkowski and magali villeneuve, ' 'highly detailed, digital painting, ' 'trending on artstation, concept art, ' 'sharp focus, illustration '}, 'score': 0.646178424, 'values': []}], 'namespace': '', 'usage': {'read_units': 6}}

이미지 임베딩으로 프롬프트 쿼리하기

아래는 파인콘(Pinecone) 벡터DB에서 이미지를 쿼리하는 예시입니다.

예시 코드: 이미지 불러오기

from IPython.display import display

original_image = dataset[882]['image']
display(original_image)

이미지 임베딩 생성 및 벡터DB 쿼리

이미지를 임베딩한 후, 해당 임베딩을 파인콘 벡터DB에 쿼리하여 유사한 프롬프트를 탐색합니다. 이 방식은 이미지의 시각적 특징을 텍스트 데이터와 매칭하는 과정을 포함하며, 가장 관련성 높은 프롬프트가 상단에 표시됩니다.

from transformers import AutoProcessor, CLIPVisionModelWithProjection

vision_model = CLIPVisionModelWithProjection.from_pretrained("openai/clip-vit-base-patch32")
processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")

inputs = processor(images=original_image, return_tensors="pt")

outputs = vision_model(**inputs)
image_embeddings = outputs.image_embeds

search_results = index.query(vector=image_embeddings[0].tolist(), top_k=3, include_values=False, include_metadata=True)
print(search_results)

{'matches': [{'id': '976', 'metadata': {'prompt': 'a beautiful military fashion blond ' "woman, playerunknown's battlegrounds " 'concept, revealing outfit, symmetrical, ' 'maximalist, lily frame, art by ilya ' 'kuvshinov, rossdraws, sharp focus, art ' 'by wlop and artgerm, extreme detail, ' 'detailed drawing, hyper detailed face '}, 'score': 0.349931747, 'values': []}, {'id': '882', 'metadata': {'prompt': 'a beautiful fashion blond woman, ' "playerunknown's battlegrounds concept, " 'revealing outfit, symmetrical, ' 'maximalist, lily frame, art by ilya ' 'kuvshinov, rossdraws, sharp focus, art ' 'by wlop and artgerm, extreme detail, ' 'detailed drawing, hyper detailed face '}, 'score': 0.343622893, 'values': []}, {'id': '478', 'metadata': {'prompt': 'artistic portrait of a cool and bored ' 'looking northern blond lady with small ' 'horns in black leather mask, beautiful ' 'blue eyes and red lips, art by artgerm '

...

'concept art, sharp focus, illustration '}, 'score': 0.330030233, 'values': []}], 'namespace': '', 'usage': {'read_units': 6}}

`FLUX.1-dev` 모델로 이미지 생성

검색된 프롬프트를 활용해 black-forest-labs/FLUX.1-dev 모델을 사용해 이미지를 생성했습니다.

GPT-4o로 이미지 설명 작성 후, 이미지 생성하기

아래 코드는 이미지를 GPT-4o를 사용하여 이미지에 대한 설명을 생성하고 이를 바탕으로 새로운 이미지를 다시 생성하는 작업을 보여줍니다.

코드 예시: GPT-4o를 활용하여 이미지 설명 작성

from openai import OpenAI

client = OpenAI()

# 이미지를 Base64로 인코딩
encoded_image = encode_image_to_base64(original_image)

# GPT-4o 모델로 이미지 설명 생성 요청
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system", 
            "content": "You are a bot that is good at analyzing images."
        },
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the contents of this image."},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"},
                },
            ],
        },
    ],
)

# 생성된 이미지 설명 출력
print(response.choices[0].message.content)

GPT-4o가 생성한 이미지 설명
The image is a digital painting of a person with blonde hair, wearing a high-tech earpiece. The person is shown in a side profile, with a focused expression. They are dressed in a military-style outfit with detailed textures, featuring a high collar and metallic accents. The overall color scheme is a mix of cool and warm tones, with a soft, glowing effect highlighting the subject's features.

GPT-4o의 설명을 바탕으로 이미지 생성
GPT-4o가 분석한 설명을 바탕으로 새로운 이미지를 생성했습니다.

결론

이미지를 GPT-4o를 활용해 설명을 작성한 후, 해당 설명을 바탕으로 이미지를 생성하는 방식이 가장 근접한 이미지를 생성할 수 있었습니다. 이 과정은 이미지와 텍스트 간의 의미적 연결을 강화하며, 원본 이미지와 유사한 스타일과 특징을 더욱 정교하게 재현할 수 있게 해줍니다.

파인콘 벡터DB을 사용하여 멀티모달 리트리버 구현하기

테스트에 사용할 스테이블 디퓨전 데이터 가져오기

파인콘 인덱스 생성

임베딩 모델 불러오기

텍스트 임베딩 벡터 데이터를 파이콘 인덱스에 저장

텍스트 임베딩으로 검색하기

이미지 임베딩으로 프롬프트 쿼리하기

`FLUX.1-dev` 모델로 이미지 생성

GPT-4o로 이미지 설명 작성 후, 이미지 생성하기

결론

관련 문서

Coin Marketplace

파인콘 벡터DB을 사용하여 멀티모달 리트리버 구현하기steemCreated with Sketch.

테스트에 사용할 스테이블 디퓨전 데이터 가져오기

파인콘 인덱스 생성

임베딩 모델 불러오기

텍스트 임베딩 벡터 데이터를 파이콘 인덱스에 저장

텍스트 임베딩으로 검색하기

이미지 임베딩으로 프롬프트 쿼리하기

FLUX.1-dev 모델로 이미지 생성

GPT-4o로 이미지 설명 작성 후, 이미지 생성하기

결론

관련 문서

Coin Marketplace

파인콘 벡터DB을 사용하여 멀티모달 리트리버 구현하기

`FLUX.1-dev` 모델로 이미지 생성