如何将 GPT-4o API 开发调用于视觉和文本?
虽然 GPT-4o 是一个新模型,并且 API 可能仍在不断发展,但以下是与它交互的一般方法:
访问和身份验证:
- OpenAI 帐户:您可能需要一个 OpenAI 帐户才能访问 API。这可能涉及注册免费帐户或使用付费套餐(如果存在不同的访问级别)。
- API 密钥:拥有帐户后,获取 API 密钥。此密钥可验证您对 GPT-4o API 的请求。
安装必要的库
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-undefined">pip install openai</code></span></span>
导入 openai 库和身份验证
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-cpp"><strong><span style="color:#bb9af7">import</span></strong> openai
openai.api_key = <span style="color:#9ece6a">"<Your API KEY>"</span></code></span></span>
完成聊天
代码:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-lua">response = openai.chat.completions.<span style="color:#e0af68">create</span>(
model=<span style="color:#9ece6a">"gpt-4o"</span>,
messages=[
{<span style="color:#9ece6a">"role"</span>: <span style="color:#9ece6a">"system"</span>, <span style="color:#9ece6a">"content"</span>: <span style="color:#9ece6a">"You are a helpful assistant."</span>},
{<span style="color:#9ece6a">"role"</span>: <span style="color:#9ece6a">"user"</span>, <span style="color:#9ece6a">"content"</span>: <span style="color:#9ece6a">"Who won the world series in 2020?"</span>},
{<span style="color:#9ece6a">"role"</span>: <span style="color:#9ece6a">"assistant"</span>, <span style="color:#9ece6a">"content"</span>: <span style="color:#9ece6a">"The Los Angeles Dodgers won the World Series in 2020."</span>},
{<span style="color:#9ece6a">"role"</span>: <span style="color:#9ece6a">"user"</span>, <span style="color:#9ece6a">"content"</span>: <span style="color:#9ece6a">"Where was it played?"</span>}
]
)</code></span></span>
输出:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-scss"><span style="color:#e0af68">print</span>(response.choices[<span style="color:#ff9e64">0</span>].message.content)</code></span></span>
对于图像处理
代码:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-lua">response = openai.chat.completions.<span style="color:#e0af68">create</span>(
model=<span style="color:#9ece6a">"gpt-4o"</span>,
messages=[
{
<span style="color:#9ece6a">"role"</span>: <span style="color:#9ece6a">"user"</span>,
<span style="color:#9ece6a">"content"</span>: [
{<span style="color:#9ece6a">"type"</span>: <span style="color:#9ece6a">"text"</span>, <span style="color:#9ece6a">"text"</span>: <span style="color:#9ece6a">"What’s in this image?"</span>},
{
<span style="color:#9ece6a">"type"</span>: <span style="color:#9ece6a">"image_url"</span>,
<span style="color:#9ece6a">"image_url"</span>: {
<span style="color:#9ece6a">"url"</span>: <span style="color:#9ece6a">"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"</span>,
},
},
],
}
],
max_tokens=<span style="color:#ff9e64">300</span>,
)</code></span></span>
编辑
输出:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-scss"><span style="color:#e0af68">print</span>(response.choices[<span style="color:#ff9e64">0</span>])</code></span></span>
编辑
对于视频处理
导入必要的库:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-python"><strong><span style="color:#bb9af7">from</span></strong> IPython.display <strong><span style="color:#bb9af7">import</span></strong> display, Image, Audio
<strong><span style="color:#bb9af7">import</span></strong> cv2 <span style="color:#565f89"># We're using OpenCV to read video, to install !pip install opencv-python</span>
<strong><span style="color:#bb9af7">import</span></strong> base64
<strong><span style="color:#bb9af7">import</span></strong> time
<strong><span style="color:#bb9af7">from</span></strong> openai <strong><span style="color:#bb9af7">import</span></strong> OpenAI
<strong><span style="color:#bb9af7">import</span></strong> os
<strong><span style="color:#bb9af7">import</span></strong> requests
client = OpenAI(api_key=os.environ.get(<span style="color:#9ece6a">"OPENAI_API_KEY"</span>, <span style="color:#9ece6a">"<your OpenAI API key if not set as env var>"</span>))</code></span></span>
使用 GPT 的视觉功能获取视频描述
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-lua">video = cv2.VideoCapture(<span style="color:#9ece6a">"<Your Viedeo Address>"</span>)
base64Frames = []
<strong><span style="color:#bb9af7">while</span></strong> video.isOpened():
success, frame = video.<span style="color:#e0af68">read</span>()
<strong><span style="color:#bb9af7">if</span></strong> <strong><span style="color:#bb9af7">not</span></strong> success:
<strong><span style="color:#bb9af7">break</span></strong>
_, buffer = cv2.imencode(<span style="color:#9ece6a">".jpg"</span>, frame)
base64Frames.append(base64.b64encode(buffer).decode(<span style="color:#9ece6a">"utf-8"</span>))
video.release()
<span style="color:#e0af68">print</span>(<span style="color:#e0af68">len</span>(base64Frames), <span style="color:#9ece6a">"frames read."</span>)</code></span></span>
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-python">display_handle = display(<span style="color:#ff9e64">None</span>, display_id=<span style="color:#ff9e64">True</span>)
<strong><span style="color:#bb9af7">for</span></strong> img <strong><span style="color:#bb9af7">in</span></strong> base64Frames:
display_handle.update(Image(data=base64.b64decode(img.encode(<span style="color:#9ece6a">"utf-8"</span>))))
time.sleep(<span style="color:#ff9e64">0.025</span>)</code></span></span>
提供提示:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-makefile">PROMPT_MESSAGES = [
{
<span style="color:#9ece6a">"role"</span>: <span style="color:#9ece6a">"user"</span>,
<span style="color:#9ece6a">"content"</span>: [
<span style="color:#9ece6a">"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video."</span>,
*map(lambda x: {<span style="color:#9ece6a">"image"</span>: x, <span style="color:#9ece6a">"resize"</span>: 768}, base64Frames[0::50]),
],
},
]
params = {
<span style="color:#9ece6a">"model"</span>: <span style="color:#9ece6a">"gpt-4o"</span>,
<span style="color:#9ece6a">"messages"</span>: PROMPT_MESSAGES,
<span style="color:#9ece6a">"max_tokens"</span>: 200,
}</code></span></span>
输出:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-lua">result = client.chat.completions.<span style="color:#e0af68">create</span>(**params)
<span style="color:#e0af68">print</span>(result.choices[<span style="color:#ff9e64">0</span>].message.content)</code></span></span>
对于音频处理
代码:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-makefile">from openai import OpenAI
client = OpenAI()
audio_file= open(<span style="color:#9ece6a">"/path/to/file/audio.mp3"</span>, <span style="color:#9ece6a">"rb"</span>)
transcription = client.audio.transcriptions.create(
model=<span style="color:#9ece6a">"whisper-1"</span>,
file=audio_file
)</code></span></span>
输出:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-scss"><span style="color:#e0af68">print</span>(transcription.text)</code></span></span>
用于图像生成
代码:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-makefile">from openai import OpenAI
client = OpenAI()
response = client.images.generate(
model=<span style="color:#9ece6a">"dall-e-3"</span>,
prompt=<span style="color:#9ece6a">"a man with big moustache and wearing long hat"</span>,
size=<span style="color:#9ece6a">"1024x1024"</span>,
quality=<span style="color:#9ece6a">"standard"</span>,
n=1,
)
image_url = response.data[0].url</code></span></span>
输出:
编辑
用于音频生成
代码:
<span style="color:#383838"><span style="background-color:#ffffff"><code class="language-python"><strong><span style="color:#bb9af7">from</span></strong> pathlib <strong><span style="color:#bb9af7">import</span></strong> Path
<strong><span style="color:#bb9af7">from</span></strong> openai <strong><span style="color:#bb9af7">import</span></strong> OpenAI
client = OpenAI()
speech_file_path = Path(__file__).parent / <span style="color:#9ece6a">"speech.mp3"</span>
response = client.audio.speech.create(
model=<span style="color:#9ece6a">"tts-1"</span>,
voice=<span style="color:#9ece6a">"alloy"</span>,
<span style="color:#e0af68">input</span>=<span style="color:#9ece6a">"Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from potentially noisy, structured, or unstructured data."</span>
)
response.stream_to_file(speech_file_path)</code></span></span>
GPT-4o API 的优势和应用
GPT-4o API 为每个人解锁了强大的 AI。要点如下:
- 在更短的时间内完成更多工作:自动执行任务、更快地分析数据并根据需要生成创意内容。
- 个性化体验:理解您的聊天机器人、适应的教育工具等等。
- 打破沟通障碍:实时翻译语言并为视障用户描述图像。
- 推动人工智能创新:研究人员可以利用 GPT-4o 的功能探索人工智能的新领域。
- 未来是开放的:期待 GPT-4o 的全新且令人兴奋的应用出现在各个领域。
简而言之,GPT-4o 是 AI 领域的变革者,它拥有多模态能力,可以理解文本、音频和视觉效果。它的 API 为开发人员和用户打开了大门,从制作自然对话到分析多媒体内容。借助 GPT-4o,任务可以自动化,体验可以个性化,沟通障碍可以打破。为未来做好准备,AI 将推动创新并改变我们与技术的互动方式!
开发人员申请GPT-4 API Key教程:轻松获取GPT4.0模型API Key并开发部署自己的ChatGPT聊天应用
在人工智能的浪潮中,OpenAI的GPT-4模型以其卓越的自然语言理解和生成能力引领着语言模型的新纪元。对于开发者而言,获取GPT-4 API Key并将其应用于自己的项目,如开发一个ChatGPT聊天应用,不仅是实践人工智能技术的绝佳机会,也能为用户带来前所未有的交互体验。本文将指导您如何轻松获取GPT-4 API Key,并提供一个简单的部署代码示例。