利用Microsoft.Extensions.AI与VectorData实现向量搜索
本文演示如何通过Microsoft.Extensions.AI和Microsoft.Extensions.VectorData库构建向量搜索系统,主要步骤如下:
- 为数据集生成嵌入向量
- 创建并填充向量存储
- 将用户查询转换为嵌入向量
- 执行相似度搜索并返回结果
初始化项目
创建控制台应用并安装必要依赖:
dotnet new console -o VectorSearchDemo
cd VectorSearchDemo
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
dotnet add package Microsoft.Extensions.VectorData.Abstractions
dotnet add package Microsoft.SemanticKernel.Connectors.InMemory --prerelease
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
dotnet add package System.Linq.AsyncEnumerable
定义数据模型
创建云服务信息实体类:
using Microsoft.Extensions.VectorData;
namespace VectorSearchDemo;
public class CloudServiceEntry
{
[VectorStoreKey]
public int Id { get; set; }
[VectorStoreData]
public string ServiceName { get; set; }
[VectorStoreData]
public string ServiceDesc { get; set; }
[VectorStoreVector(
Dimensions: 384,
DistanceFunction = DistanceFunction.CosineSimilarity)]
public ReadOnlyMemory<float> VectorData { get; set; }
}
准备数据集
初始化云服务描述数据:
List<CloudServiceEntry> services = new()
{
new() {
Id = 0,
ServiceName = "Azure App Service",
ServiceDesc = "托管.NET、Java等应用的完全托管服务,自动处理高可用和扩展"
},
new() {
Id = 1,
ServiceName = "Azure Service Bus",
ServiceDesc = "企业级消息代理服务,支持点对点和发布订阅模式"
},
new() {
Id = 2,
ServiceName = "Azure Blob Storage",
ServiceDesc = "云文件存储服务,支持海量数据存储和高可用架构"
}
};
配置嵌入生成器
连接OpenAI服务生成嵌入向量:
var config = new ConfigurationBuilder()
.AddUserSecrets<Program>()
.Build();
var embeddingGenerator = new OpenAIClient(
new ApiKeyCredential(config["OpenAIKey"]))
.GetEmbeddingClient(model: config["ModelName"])
.AsIEmbeddingGenerator();
构建向量存储
填充向量数据库:
var vectorDb = new InMemoryVectorStore();
var collection = vectorDb.GetCollection<int, CloudServiceEntry>("cloudServices");
await collection.EnsureCollectionExistsAsync();
foreach (var service in services)
{
service.VectorData = await embeddingGenerator.GenerateVectorAsync(service.ServiceDesc);
await collection.UpsertAsync(service);
}
执行向量搜索
转换查询并检索结果:
string query = "存储Word文档的最佳服务是什么?";
var queryVector = await embeddingGenerator.GenerateVectorAsync(query);
var matches = collection.SearchAsync(queryVector, top: 3);
await foreach (var match in matches)
{
Console.WriteLine($"服务: {match.Record.ServiceName}");
Console.WriteLine($"匹配度: {match.Score:F4}");
}