当前位置：首页 > 技术 > 正文内容

多层感知机实现：基于MNIST手写数字识别

访客技术 2026年6月28日 2

一、概述

二、网络架构设计

三、算法实现细节

1、依赖库导入

2、神经网络类构建

3、训练流程实现

4、参数矩阵初始化方法

5、矩阵与向量转换

6、梯度下降优化

6.1、损失函数计算

6.1.1、前向传播过程

6.2、反向传播算法

7、预测功能实现

四、完整代码实现

五、MNIST手写数字识别应用

一、概述

本文将详细介绍如何构建一个完整的多层感知机网络，并通过MNIST手写数字数据集进行验证。读者应具备神经网络基础知识，包括前向传播、反向传播、激活函数等概念。

我们将实现一个三层神经网络，通过784个输入特征（28×28像素图像）识别0-9的手写数字，并在测试集上评估模型性能。

二、网络架构设计

网络采用三层结构：

输入层：784个神经元（对应28×28像素）
隐藏层：25个神经元
输出层：10个神经元（对应0-9数字）

关键设计考虑：

输入数据为784维特征向量
权重矩阵维度设计
前向传播计算流程
Sigmoid激活函数应用
反向传播误差计算
偏置项处理
损失函数定义
权重更新规则

反向传播误差计算公式：

δ(4) = a(4) - y
δ(3) = (Θ³)ᵀδ(4) * g'(z³)
δ(2) = (Θ²)ᵀδ(3) * g'(z²)
δ(1) = 输入层，不可更新
g'为Sigmoid梯度函数
g'(z) = g(z)(1-g(z)); 其中g(z) = 1/(1+e⁻ᶻ)

梯度下降算法流程：

for i=1 to m
  设置a(1)=x⁽ⁱ⁾
  执行前向传播计算各层a⁽ˡ⁾, l=2,3...L
  使用y⁽ⁱ⁾计算δ(L)=a⁽ᴸ⁾-y⁽ⁱ⁾
  计算δ(L-1), δ(L-2)...δ(2)
  更新Δ⁽ˡ⁽ᵢⱼ⁾⁾: Δ⁽ˡ⁽ᵢⱼ⁾⁾ += a⁽ˡ⁾ⱼδ⁽ˡ⁺¹⁾ᵢ

三、算法实现细节

1、依赖库导入

import numpy as np
from Neural_Network_Lab.utils.features import prepare_for_training
from Neural_Network_Lab.utils.hypothesis import sigmoid, sigmoid_gradient

工具函数模块封装了数据预处理和激活函数实现。

2、神经网络类构建

class NeuralNetwork:
    def __init__(self, data, labels, layers, normalize_data=False):
        data_processed = prepare_for_training(data, normalize_data=normalize_data)[0]
        self.data = data_processed
        self.labels = labels
        self.layers = layers  # [784, 25, 10]
        self.normalize_data = normalize_data
        self.weights = NeuralNetwork.initialize_weights(layers)

3、训练流程实现

    def train(self, max_iterations=1000, learning_rate=0.1):
        # 将权重矩阵转换为向量便于优化
        flattened_weights = NeuralNetwork.flatten_weights(self.weights)
        optimized_weights, cost_history = NeuralNetwork.gradient_descent(
            self.data, self.labels, flattened_weights, self.layers, 
            max_iterations, learning_rate
        )
        self.weights = NeuralNetwork.reshape_weights(optimized_weights, self.layers)
        return self.weights, cost_history

4、参数矩阵初始化方法

    @staticmethod
    def initialize_weights(layers):
        num_layers = len(layers)
        weights = {}  # 使用字典存储各层权重
        for layer_idx in range(num_layers - 1):
            input_count = layers[layer_idx]
            output_count = layers[layer_idx + 1]
            # 初始化小随机值，考虑偏置项
            weights[layer_idx] = np.random.rand(output_count, input_count + 1) * 0.05
        return weights

5、矩阵与向量转换

    @staticmethod
    def flatten_weights(weights):
        num_weight_layers = len(weights)
        flattened = np.array([])
        for layer_idx in range(num_weight_layers):
            flattened = np.hstack((flattened, weights[layer_idx].flatten()))
        return flattened

    @staticmethod
    def reshape_weights(flattened, layers):
        num_layers = len(layers)
        weights = {}
        shift = 0
        for layer_idx in range(num_layers - 1):
            input_count = layers[layer_idx]
            output_count = layers[layer_idx + 1]
            
            width = input_count + 1
            height = output_count
            volume = width * height
            
            start_idx = shift
            end_idx = shift + volume
            layer_flattened = flattened[start_idx:end_idx]
            weights[layer_idx] = layer_flattened.reshape((height, width))
            shift += volume
        
        return weights

6、梯度下降优化

6.1、损失函数计算

6.1.1、前向传播过程

    @staticmethod
    def forward_propagation(data, weights, layers):
        num_layers = len(layers)
        num_examples = data.shape[0]
        current_activation = data  # 输入层激活值

        # 逐层计算
        for layer_idx in range(num_layers - 1):
            weight = weights[layer_idx]
            next_activation = sigmoid(np.dot(current_activation, weight.T))
            # 添加偏置项
            next_activation = np.hstack((np.ones((num_examples, 1)), next_activation))
            current_activation = next_activation

        # 返回输出层结果（不含偏置项）
        return current_activation[:, 1:]

损失函数实现：

    @staticmethod
    def calculate_cost(data, labels, weights, layers):
        num_layers = len(layers)
        num_examples = data.shape[0]
        num_labels = layers[-1]
        
        # 前向传播获取预测结果
        predictions = NeuralNetwork.forward_propagation(data, weights, layers)
        
        # 创建one-hot编码标签
        one_hot_labels = np.zeros((num_examples, num_labels))
        for example_idx in range(num_examples):
            one_hot_labels[example_idx][labels[example_idx][0]] = 1
            
        # 计算交叉熵损失
        correct_cost = np.sum(np.log(predictions[one_hot_labels == 1]))
        incorrect_cost = np.sum(np.log(1 - predictions[one_hot_labels == 0]))
        cost = (-1/num_examples) * (correct_cost + incorrect_cost)
        return cost

6.2、反向传播算法

    @staticmethod
    def backpropagation(data, labels, weights, layers):
        num_layers = len(layers)
        (num_examples, num_features) = data.shape
        num_classes = layers[-1]
        
        # 初始化误差项
        errors = {}
        for layer_idx in range(num_layers - 1):
            input_count = layers[layer_idx]
            output_count = layers[layer_idx + 1]
            errors[layer_idx] = np.zeros((output_count, input_count + 1))
        
        # 对每个样本计算误差
        for example_idx in range(num_examples):
            layer_inputs = {}
            layer_activations = {}
            activation = data[example_idx, :].reshape((num_features, 1))
            layer_activations[0] = activation
            
            # 前向传播记录中间值
            for layer_idx in range(num_layers - 1):
                weight = weights[layer_idx]
                layer_input = np.dot(weight, activation)
                activation = np.vstack((np.array([[1]]), sigmoid(layer_input)))
                layer_inputs[layer_idx + 1] = layer_input
                layer_activations[layer_idx + 1] = activation
                
            output_activation = activation[1:, :]
            
            # 计算输出层误差
            delta = {}
            one_hot_label = np.zeros((num_classes, 1))
            one_hot_label[labels[example_idx][0]] = 1
            delta[num_layers - 1] = output_activation - one_hot_label
            
            # 反向计算各层误差
            for layer_idx in range(num_layers - 2, 0, -1):
                weight = weights[layer_idx]
                next_delta = delta[layer_idx + 1]
                layer_input = layer_inputs[layer_idx]
                layer_input = np.vstack((np.array([1]), layer_input))
                delta[layer_idx] = np.dot(weight.T, next_delta) * sigmoid(layer_input)
                delta[layer_idx] = delta[layer_idx][1:, :]
            
            # 累加梯度
            for layer_idx in range(num_layers - 1):
                layer_error = np.dot(delta[layer_idx + 1], layer_activations[layer_idx].T)
                errors[layer_idx] = errors[layer_idx] + layer_error
        
        # 平均梯度
        for layer_idx in range(num_layers - 1):
            errors[layer_idx] = errors[layer_idx] * (1/num_examples)
        
        return errors

梯度下降步骤实现：

    @staticmethod
    def gradient_step(data, labels, current_weights, layers):
        weights = NeuralNetwork.reshape_weights(current_weights, layers)
        gradients = NeuralNetwork.backpropagation(data, labels, weights, layers)
        flattened_gradients = NeuralNetwork.flatten_weights(gradients)
        return flattened_gradients

    @staticmethod
    def gradient_descent(data, labels, initial_weights, layers, max_iterations, learning_rate):
        optimized_weights = initial_weights
        cost_history = []
        
        for iteration in range(max_iterations):
            if iteration % 10 == 0:
                print(f"当前迭代次数: {iteration}")
                
            cost = NeuralNetwork.calculate_cost(
                data, labels, 
                NeuralNetwork.reshape_weights(optimized_weights, layers), 
                layers
            )
            cost_history.append(cost)
            
            gradients = NeuralNetwork.gradient_step(data, labels, optimized_weights, layers)
            optimized_weights = optimized_weights - learning_rate * gradients
            
        return optimized_weights, cost_history

7、预测功能实现

    def predict(self, data):
        data_processed = prepare_for_training(data, normalize_data=self.normalize_data)[0]
        num_examples = data_processed.shape[0]
        predictions = NeuralNetwork.forward_propagation(data_processed, self.weights, self.layers)
        
        return np.argmax(predictions, axis=1).reshape((num_examples, 1))

四、完整代码实现

import numpy as np
from Neural_Network_Lab.utils.features import prepare_for_training
from Neural_Network_Lab.utils.hypothesis import sigmoid, sigmoid_gradient

class NeuralNetwork:
    def __init__(self, data, labels, layers, normalize_data=False):
        data_processed = prepare_for_training(data, normalize_data=normalize_data)[0]
        self.data = data_processed
        self.labels = labels
        self.layers = layers  # [784, 25, 10]
        self.normalize_data = normalize_data
        self.weights = NeuralNetwork.initialize_weights(layers)

    def predict(self, data):
        data_processed = prepare_for_training(data, normalize_data=self.normalize_data)[0]
        num_examples = data_processed.shape[0]
        predictions = NeuralNetwork.forward_propagation(data_processed, self.weights, self.layers)
        
        return np.argmax(predictions, axis=1).reshape((num_examples, 1))

    def train(self, max_iterations=1000, learning_rate=0.1):
        flattened_weights = NeuralNetwork.flatten_weights(self.weights)
        optimized_weights, cost_history = NeuralNetwork.gradient_descent(
            self.data, self.labels, flattened_weights, self.layers, 
            max_iterations, learning_rate
        )
        self.weights = NeuralNetwork.reshape_weights(optimized_weights, self.layers)
        return self.weights, cost_history

    @staticmethod
    def gradient_descent(data, labels, initial_weights, layers, max_iterations, learning_rate):
        optimized_weights = initial_weights
        cost_history = []
        
        for iteration in range(max_iterations):
            if iteration % 10 == 0:
                print(f"当前迭代次数: {iteration}")
                
            cost = NeuralNetwork.calculate_cost(
                data, labels, 
                NeuralNetwork.reshape_weights(optimized_weights, layers), 
                layers
            )
            cost_history.append(cost)
            
            gradients = NeuralNetwork.gradient_step(data, labels, optimized_weights, layers)
            optimized_weights = optimized_weights - learning_rate * gradients
            
        return optimized_weights, cost_history

    @staticmethod
    def gradient_step(data, labels, current_weights, layers):
        weights = NeuralNetwork.reshape_weights(current_weights, layers)
        gradients = NeuralNetwork.backpropagation(data, labels, weights, layers)
        flattened_gradients = NeuralNetwork.flatten_weights(gradients)
        return flattened_gradients

    @staticmethod
    def backpropagation(data, labels, weights, layers):
        num_layers = len(layers)
        (num_examples, num_features) = data.shape
        num_classes = layers[-1]
        
        errors = {}
        for layer_idx in range(num_layers - 1):
            input_count = layers[layer_idx]
            output_count = layers[layer_idx + 1]
            errors[layer_idx] = np.zeros((output_count, input_count + 1))
        
        for example_idx in range(num_examples):
            layer_inputs = {}
            layer_activations = {}
            activation = data[example_idx, :].reshape((num_features, 1))
            layer_activations[0] = activation
            
            for layer_idx in range(num_layers - 1):
                weight = weights[layer_idx]
                layer_input = np.dot(weight, activation)
                activation = np.vstack((np.array([[1]]), sigmoid(layer_input)))
                layer_inputs[layer_idx + 1] = layer_input
                layer_activations[layer_idx + 1] = activation
                
            output_activation = activation[1:, :]
            
            delta = {}
            one_hot_label = np.zeros((num_classes, 1))
            one_hot_label[labels[example_idx][0]] = 1
            delta[num_layers - 1] = output_activation - one_hot_label
            
            for layer_idx in range(num_layers - 2, 0, -1):
                weight = weights[layer_idx]
                next_delta = delta[layer_idx + 1]
                layer_input = layer_inputs[layer_idx]
                layer_input = np.vstack((np.array([1]), layer_input))
                delta[layer_idx] = np.dot(weight.T, next_delta) * sigmoid(layer_input)
                delta[layer_idx] = delta[layer_idx][1:, :]
            
            for layer_idx in range(num_layers - 1):
                layer_error = np.dot(delta[layer_idx + 1], layer_activations[layer_idx].T)
                errors[layer_idx] = errors[layer_idx] + layer_error
        
        for layer_idx in range(num_layers - 1):
            errors[layer_idx] = errors[layer_idx] * (1/num_examples)
        
        return errors

    @staticmethod
    def calculate_cost(data, labels, weights, layers):
        num_layers = len(layers)
        num_examples = data.shape[0]
        num_labels = layers[-1]
        
        predictions = NeuralNetwork.forward_propagation(data, weights, layers)
        
        one_hot_labels = np.zeros((num_examples, num_labels))
        for example_idx in range(num_examples):
            one_hot_labels[example_idx][labels[example_idx][0]] = 1
            
        correct_cost = np.sum(np.log(predictions[one_hot_labels == 1]))
        incorrect_cost = np.sum(np.log(1 - predictions[one_hot_labels == 0]))
        cost = (-1/num_examples) * (correct_cost + incorrect_cost)
        return cost

    @staticmethod
    def forward_propagation(data, weights, layers):
        num_layers = len(layers)
        num_examples = data.shape[0]
        current_activation = data

        for layer_idx in range(num_layers - 1):
            weight = weights[layer_idx]
            next_activation = sigmoid(np.dot(current_activation, weight.T))
            next_activation = np.hstack((np.ones((num_examples, 1)), next_activation))
            current_activation = next_activation

        return current_activation[:, 1:]

    @staticmethod
    def reshape_weights(flattened, layers):
        num_layers = len(layers)
        weights = {}
        shift = 0
        for layer_idx in range(num_layers - 1):
            input_count = layers[layer_idx]
            output_count = layers[layer_idx + 1]
            
            width = input_count + 1
            height = output_count
            volume = width * height
            
            start_idx = shift
            end_idx = shift + volume
            layer_flattened = flattened[start_idx:end_idx]
            weights[layer_idx] = layer_flattened.reshape((height, width))
            shift += volume
        
        return weights

    @staticmethod
    def flatten_weights(weights):
        num_weight_layers = len(weights)
        flattened = np.array([])
        for layer_idx in range(num_weight_layers):
            flattened = np.hstack((flattened, weights[layer_idx].flatten()))
        return flattened

    @staticmethod
    def initialize_weights(layers):
        num_layers = len(layers)
        weights = {}
        for layer_idx in range(num_layers - 1):
            input_count = layers[layer_idx]
            output_count = layers[layer_idx + 1]
            weights[layer_idx] = np.random.rand(output_count, input_count + 1) * 0.05
        return weights

五、MNIST手写数字识别应用

MNIST数据集包含10,000个手写数字样本，每张图像为28×28像素，第一列为标签，其余784列为像素值。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
from Neural_Network_Lab.Neural_Network import NeuralNetwork

# 加载数据
data = pd.read_csv('../Neural_Network_Lab/data/mnist-demo.csv')

# 可视化部分样本
num_samples = 25
grid_size = math.ceil(math.sqrt(num_samples))
plt.figure(figsize=(10, 10))
for plot_idx in range(num_samples):
    digit = data[plot_idx:plot_idx+1].values
    label = digit[0][0]
    pixels = digit[0][1:]
    image_size = int(math.sqrt(pixels.shape[0]))
    image = pixels.reshape((image_size, image_size))
    plt.subplot(grid_size, grid_size, plot_idx+1)
    plt.imshow(image, cmap='Greys')
    plt.title(label)
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()

# 划分训练集和测试集
train_data = data.sample(frac=0.8)
test_data = data.drop(train_data.index)

train_data = train_data.values
test_data = test_data.values

num_training_samples = 8000

X_train = train_data[:num_training_samples, 1:]
y_train = train_data[:num_training_samples, [0]]

X_test = test_data[:, 1:]
y_test = test_data[:, [0]]

# 设置网络参数
layers = [784, 25, 10]
normalize_data = True
max_iterations = 500
learning_rate = 0.1

# 创建并训练神经网络
nn = NeuralNetwork(X_train, y_train, layers, normalize_data)
(weights, cost_history) = nn.train(max_iterations, learning_rate)

# 绘制损失曲线
plt.plot(range(len(cost_history)), cost_history)
plt.xlabel('迭代次数')
plt.ylabel('损失值')
plt.show()

# 进行预测
train_predictions = nn.predict(X_train)
test_predictions = nn.predict(X_test)

# 计算准确率
train_accuracy = np.sum((train_predictions == y_train) / y_train.shape[0] * 100)
test_accuracy = np.sum((test_predictions == y_test) / y_test.shape[0] * 100)

print(f"训练集准确率: {train_accuracy:.2f}%")
print(f"测试集准确率: {test_accuracy:.2f}%")

# 可视化预测结果
num_samples = 64
grid_size = math.ceil(math.sqrt(num_samples))
plt.figure(figsize=(15, 15))
for plot_idx in range(num_samples):
    true_label = y_test[plot_idx, 0]
    pixels = X_test[plot_idx, :]
    
    predicted_label = test_predictions[plot_idx][0]
    
    image_size = int(math.sqrt(pixels.shape[0]))
    image = pixels.reshape((image_size, image_size))
    plt.subplot(grid_size, grid_size, plot_idx+1)
    color_map = 'Greens' if predicted_label == true_label else 'Reds'
    plt.imshow(image, cmap=color_map)
    plt.title(predicted_label)
    plt.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False)

plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()

实验使用8,000个样本进行训练，2,000个样本进行测试，迭代500次。

当前模型准确率有待提高，读者可通过调整迭代次数、网络层次或增加训练数据来提升性能。

标签: 多层感知机神经网络 MNIST

返回列表

上一篇：C++ 中的拷贝与移动语义深度解析

下一篇：frontend-maven-plugin快速集成与实战指南

Linux crontab 详解

1) crontab 是什么cron 是 Linux 的定时任务守护进程；crontab 是用来编辑/查看“按时间周期执行命令”的表（cron table）。常见两类：用户 crontab：每个用户一份（crontab -e 编辑）系统级 crontab / cron.d：可指定执行用户（/etc/crontab、/etc/cron.d/*）2) crontab 时间...

富文本里可以允许的 HTML 属性

一、所有标签默认允许的安全属性（极少）class （可选）id （通常建议禁用）title️ 注意：id 容易被滥用做锚点注入，很多系统直接禁用class 允许的话最好只允许固定前缀（如 editor-*）二、a 标签允许属性<a href="" t...

方法一：通过官网安装包（最简单，适合初学者）如果你只是想快速安装并开始使用，这是最直接的方法。访问 Node.js 官网。页面会显示两个版本：LTS (Recommended For Most Users)：长期支持版，最稳定。建议选这个。Current：最新特性版，包含最新功能但可能不够稳定。下载 .pkg 安装包并运行。按照安装向导点击“下一步”即可完成。方法二：使用 Homebrew 安装（...

Dom\HTML_NO_DEFAULT_NS 的副作用：自动加闭合标签

在使用Dom\HTMLDocument时，Dom\HTML_NO_DEFAULT_NS 将禁止在解析过程中设置元素的命名空间, 此设置是为了与DOMDocument向后兼容而存在的。当使用它时，已知的一个副作用就是：自动加闭合标签例如 </img> 为什么会这样？当你使用：Dom\HTML_NO_DEFAULT_NS文档会变成无命名空间模式，此时内部更接近 XML...

Laravel 事件和监听器创建

在 Laravel 中，使用 Artisan 命令创建 Events（事件）和 Listeners（监听器）是非常高效的。你可以通过以下几种方式来实现：1. 手动创建单个 Event如果你只想创建一个事件类，可以使用 make:event 命令：Bashphp artisan make:event UserRegistered执行后，文件将生成在 app/Even...

自定义域名解析神器 dnsmasq

什么是 dnsmasq？dnsmasq 是一个轻量级、功能强大的网络服务工具，专为小型和中等规模网络设计。它是一个综合的网络基础设施解决方案[1]。dnsmasq 能做什么？功能说明应用场景DNS 转发与缓存将 DNS 查询转发到上游服务器（ISP、Google DNS 等），并在本地缓存结果加快 DNS 查询速度，减少外部 DNS 流量本地 DNS解析本地网络设备的主机名，无需编辑&n...

老程序员博客

多层感知机实现：基于MNIST手写数字识别

一、概述

二、网络架构设计

三、算法实现细节

1、依赖库导入

2、神经网络类构建

3、训练流程实现

4、参数矩阵初始化方法

5、矩阵与向量转换

6、梯度下降优化

6.1、损失函数计算

6.1.1、前向传播过程

6.2、反向传播算法

7、预测功能实现

四、完整代码实现

五、MNIST手写数字识别应用

相关文章

Linux crontab 详解

富文本里可以允许的 HTML 属性

Mac 安装 Node.js 指南

Dom\HTML_NO_DEFAULT_NS 的副作用：自动加闭合标签

Laravel 事件和监听器创建

自定义域名解析神器 dnsmasq

发表评论

Copyright © agingcoder.cn

Powered By Z-BlogPHP. Theme by TOYEAN.

老程序员博客

多层感知机实现：基于MNIST手写数字识别

一、概述

二、网络架构设计

三、算法实现细节

1、依赖库导入

2、神经网络类构建

3、训练流程实现

4、参数矩阵初始化方法

5、矩阵与向量转换

6、梯度下降优化

6.1、损失函数计算

6.1.1、前向传播过程

6.2、反向传播算法

7、预测功能实现

四、完整代码实现

五、MNIST手写数字识别应用

相关文章

Linux crontab 详解

富文本里可以允许的 HTML 属性

Mac 安装 Node.js 指南

Dom\HTML_NO_DEFAULT_NS 的副作用：自动加闭合标签

Laravel 事件和监听器创建

自定义域名解析神器 dnsmasq

发表评论取消回复

Copyright © agingcoder.cn

Powered By Z-BlogPHP. Theme by TOYEAN.

发表评论