使用NumPy通过真实数据确定摩尔定律

Scatter plot of MOS transistor count per microprocessor every two years as a demonstration of Moore's Law.

以对数刻度为y轴，以引入日期为线性刻度x轴，绘制的给定芯片报告的晶体管数量。蓝色数据点来自晶体管数量表。红线是普通最小二乘法预测，橙线是摩尔定律。

您将做什么¶

1965年，工程师戈登·摩尔预测，未来十年芯片上的晶体管数量将每两年翻一番[1]。您将把摩尔的预测与他预测后的53年里的实际晶体管数量进行比较。您将确定最佳拟合常数，以描述半导体上晶体管数量相对于摩尔定律的指数增长。

您将学到的技能¶

从*.csv文件加载数据
使用普通最小二乘法进行线性回归并预测指数增长
您将比较模型之间的指数增长常数
将您的分析保存在一个文件中
- 以NumPy压缩文件*.npz格式
- 以*.csv文件格式
评估过去五十年半导体制造商所取得的惊人进展

您将需要什么¶

1. 这些包

NumPy
Matplotlib

使用以下命令导入

import matplotlib.pyplot as plt
import numpy as np

2. 由于这是一个指数增长定律，您需要了解一些使用自然对数和指数进行数学计算的背景知识。

您将使用这些NumPy和Matplotlib函数

np.loadtxt: 这个函数将文本加载到NumPy数组中
np.log: 这个函数对NumPy数组中的所有元素取自然对数
np.exp: 这个函数对所有元素取指数
lambda: 这是一个创建函数模型的最小函数定义
plt.semilogy: 这个函数将在带有线性x轴和 $\log_{10}$ y轴的图形上绘制x-y数据plt.plot: 这个函数将在线性轴上绘制x-y数据
切片数组: 查看加载到工作空间中的数据部分，例如切片数组x[:10]以获取数组中的前10个值，x
布尔数组索引: 要查看满足给定条件的数据部分，请使用布尔运算对数组进行索引
np.block: 将数组组合成2D数组
np.newaxis: 将1D向量更改为行向量或列向量
np.savez 和 np.savetxt: 这两个函数将分别以压缩数组格式和文本格式保存您的数组

构建摩尔定律作为指数函数¶

您的经验模型假设每个半导体的晶体管数量遵循指数增长，

\log(\text{transistor_count})= f(\text{year}) = A\cdot \text{year}+B,

其中 $A$ 和 $B$ 是拟合常数。您使用半导体制造商的数据来查找拟合常数。

您通过指定增加的晶体管速率2并给出一个给定年份的初始晶体管数量来确定摩尔定律的这些常数。

您将摩尔定律表述为指数形式如下：

\text{transistor_count}= e^{A_M\cdot \text{year} +B_M}.

其中 $A_M$ 和 $B_M$ 是每两年晶体管数量翻倍的常数，并从1971年的2250个晶体管开始，

\dfrac{\text{transistor_count}(\text{year} +2)}{\text{transistor_count}(\text{year})} = 2 = \dfrac{e^{B_M}e^{A_M \text{year} + 2A_M}}{e^{B_M}e^{A_M \text{year}}} = e^{2A_M} \rightarrow A_M = \frac{\log(2)}{2}
$\log(2250) = \frac{\log(2)}{2}\cdot 1971 + B_M \rightarrow B_M = \log(2250)-\frac{\log(2)}{2}\cdot 1971$

所以摩尔定律表示为指数函数是

\log(\text{transistor_count})= A_M\cdot \text{year}+B_M,

where

$A_M=0.3466$

$B_M=-675.4$

由于该函数代表摩尔定律，请使用lambda将其定义为Python函数

A_M = np.log(2) / 2
B_M = np.log(2250) - A_M * 1971
Moores_law = lambda year: np.exp(B_M) * np.exp(A_M * year)

1971年，Intel 4004芯片上有2250个晶体管。使用Moores_law检查戈登·摩尔在1973年预计的半导体数量。

ML_1971 = Moores_law(1971)
ML_1973 = Moores_law(1973)
print("In 1973, G. Moore expects {:.0f} transistors on Intels chips".format(ML_1973))
print("This is x{:.2f} more transistors than 1971".format(ML_1973 / ML_1971))

In 1973, G. Moore expects 4500 transistors on Intels chips
This is x2.00 more transistors than 1971

将历史制造数据加载到您的工作空间¶

现在，根据半导体每芯片的历史数据进行预测。每年的晶体管数量[3]位于transistor_data.csv文件中。在将*.csv文件加载到NumPy数组之前，最好先检查文件的结构。然后，定位感兴趣的列并将它们保存到变量中。将文件中的两列保存到数组data中。

在此，打印transistor_data.csv的前10行。列是

处理器	MOS晶体管数量	引入日期	设计者	MOS工艺	面积
Intel 4004 (4位16针)	2250	1971	Intel	“10,000 nm”	12 mm²
...	...	...	...	...	...

! head transistor_data.csv

Processor,MOS transistor count,Date of Introduction,Designer,MOSprocess,Area
Intel 4004 (4-bit  16-pin),2250,1971,Intel,"10,000 nm",12 mm²
Intel 8008 (8-bit  18-pin),3500,1972,Intel,"10,000 nm",14 mm²
NEC μCOM-4 (4-bit  42-pin),2500,1973,NEC,"7,500 nm",?
Intel 4040 (4-bit  16-pin),3000,1974,Intel,"10,000 nm",12 mm²
Motorola 6800 (8-bit  40-pin),4100,1974,Motorola,"6,000 nm",16 mm²
Intel 8080 (8-bit  40-pin),6000,1974,Intel,"6,000 nm",20 mm²
TMS 1000 (4-bit  28-pin),8000,1974,Texas Instruments,"8,000 nm",11 mm²
MOS Technology 6502 (8-bit  40-pin),4528,1975,MOS Technology,"8,000 nm",21 mm²
Intersil IM6100 (12-bit  40-pin; clone of PDP-8),4000,1975,Intersil,,

您不需要指定处理器、设计者、MOS工艺或面积的列。这样就剩下第二列和第三列，分别是MOS晶体管数量和引入日期。

接下来，使用np.loadtxt将这两列加载到NumPy数组中。下面的额外选项将使数据成为所需的格式

delimiter = ',': 将分隔符指定为逗号‘,’（这是默认行为）
usecols = [1,2]: 从csv中导入第二列和第三列
skiprows = 1: 不使用第一行，因为它是一个标题行

data = np.loadtxt("transistor_data.csv", delimiter=",", usecols=[1, 2], skiprows=1)

您已将半导体行业的全部历史加载到名为data的NumPy数组中。第一列是MOS晶体管数量，第二列是四位数的引入日期。

接下来，通过将两列分配给变量year和transistor_count，使数据更易于读取和管理。通过使用[:10]切片year和transistor_count数组来打印前10个值。打印这些值以检查您是否已将数据保存到正确的变量中。

year = data[:, 1]  # grab the second column and assign
transistor_count = data[:, 0]  # grab the first column and assign

print("year:\t\t", year[:10])
print("trans. cnt:\t", transistor_count[:10])

year:		 [1971. 1972. 1973. 1974. 1974. 1974. 1974. 1975. 1975. 1975.]
trans. cnt:	 [2250. 3500. 2500. 3000. 4100. 6000. 8000. 4528. 4000. 5000.]

您正在创建一个函数，该函数根据年份预测晶体管数量。您有一个*自变量*year和一个*因变量*transistor_count。将因变量转换为对数刻度，

$y_i = \log($ transistor_count[i] $),$

得到一个线性方程，

$y_i = A\cdot \text{year} +B$ .

yi = np.log(transistor_count)

计算晶体管的历史增长曲线¶

您的模型假设yi是year的函数。现在，找到最佳拟合模型，以最小化 $y_i$ 和 $A\cdot \text{year} +B,$ 之间的差异，如下所示：

$\min \sum|y_i - (A\cdot \text{year}_i + B)|^2.$

这个平方和误差可以用数组简洁地表示为：

$\sum|\mathbf{y}-\mathbf{Z} [A,~B]^T|^2,$

其中 $\mathbf{y}$ 是1D数组中晶体管数量对数的观测值，而 $\mathbf{Z}=[\text{year}_i^1,~\text{year}_i^0]$ 是 $\text{year}_i$ 的一阶和零阶多项式项。通过在 $\mathbf{Z}-$ 矩阵中创建此回归量集，您可以设置一个普通最小二乘统计模型。

Z是一个具有两个参数的线性模型，即二阶多项式。因此，我们可以使用numpy.polynomial.Polynomial来表示模型，并使用拟合功能来确定模型参数

model = np.polynomial.Polynomial.fit(year, yi, deg=1)

默认情况下，Polynomial.fit在由自变量（在此例中为year）确定的域中执行拟合。未缩放和未移位的模型的系数可以通过convert方法恢复

model = model.convert()
model

加载中...

单个参数 $A$ 和 $B$ 是我们线性模型的系数

B, A = model

制造商每两年将晶体管数量翻倍了吗？您有最终公式，

\dfrac{\text{transistor_count}(\text{year} +2)}{\text{transistor_count}(\text{year})} = xFactor = \dfrac{e^{B}e^{A( \text{year} + 2)}}{e^{B}e^{A \text{year}}} = e^{2A}

其中晶体管数量的增加是 $xFactor,$ ，年数是2，而 $A$ 是半对数函数上的最佳拟合斜率。

print(f"Rate of semiconductors added on a chip every 2 years: {np.exp(2 * A):.2f}")

Rate of semiconductors added on a chip every 2 years: 1.98

根据您的最小二乘回归模型，每两年，芯片上的半导体数量增加了1.98倍。您有一个模型可以预测每年的半导体数量。现在将您的模型与实际制造报告进行比较。绘制线性回归结果和所有晶体管数量。

在此，使用plt.semilogy在对数刻度上绘制晶体管数量，并在线性刻度上绘制年份。您已经定义了三个数组来得到最终模型

y_i = \log(\text{晶体管数量}),

$y_i = A \cdot \text{year} + B,$

和

\log(\text{晶体管数量}) = A\cdot \text{年} + B,

您的变量 晶体管数量、年 和 yi 的维度相同，都是 (179,)。NumPy 数组在绘图时需要具有相同的维度。预测的晶体管数量现在是

\text{预测晶体管数量} = e^Be^{A\cdot \text{年}}.

在下一个图中，使用 fivethirtyeight 样式表。该样式表复制了 https://fivethirtyeight.com 的元素。使用 plt.style.use 更改 matplotlib 样式。

transistor_count_predicted = np.exp(B) * np.exp(A * year)
transistor_Moores_law = Moores_law(year)
plt.style.use("fivethirtyeight")
plt.semilogy(year, transistor_count, "s", label="MOS transistor count")
plt.semilogy(year, transistor_count_predicted, label="linear regression")


plt.plot(year, transistor_Moores_law, label="Moore's Law")
plt.title(
    "MOS transistor count per microprocessor\n"
    + "every two years \n"
    + "Transistor count was x{:.2f} higher".format(np.exp(A * 2))
)
plt.xlabel("year introduced")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5))
plt.ylabel("# of transistors\nper microprocessor")

MOS 晶体管数量每两年在微处理器上的散点图，其中一条红色实线表示普通最小二乘法预测，一条橙色实线表示摩尔定律。

线性回归捕捉了半导体中每年的晶体管数量的增长。2015 年，半导体制造商声称他们已无法跟上摩尔定律。您的分析表明，自 1971 年以来，平均每两年晶体管数量增长了 1.98 倍，而戈登·摩尔预测每两年将增长 2 倍。这是一个惊人的预测。

考虑 2017 年。将数据与您的线性回归模型和戈登·摩尔的预测进行比较。首先，获取 2017 年的晶体管数量。您可以通过布尔比较器来完成此操作：

年 == 2017.

然后，使用上面定义的 Moores_law 并将您的最佳拟合常数代入您的函数来预测 2017 年：

\text{晶体管数量} = e^{B}e^{A\cdot \text{年}}.

比较这些测量值的一个好方法是比较您的预测和摩尔定律的预测与平均晶体管数量，并查看该年份报告值的范围。使用 plt.plot 选项 alpha=0.2 来增加数据的透明度。点越不透明，在该测量值上表示的报告值就越多。绿色 $+$ 是 2017 年的平均报告晶体管数量。绘制您对 $\pm\frac{1}{2}$ 年的预测。

transistor_count2017 = transistor_count[year == 2017]
print(
    transistor_count2017.max(), transistor_count2017.min(), transistor_count2017.mean()
)
y = np.linspace(2016.5, 2017.5)
your_model2017 = np.exp(B) * np.exp(A * y)
Moore_Model2017 = Moores_law(y)

plt.plot(
    2017 * np.ones(np.sum(year == 2017)),
    transistor_count2017,
    "ro",
    label="2017",
    alpha=0.2,
)
plt.plot(2017, transistor_count2017.mean(), "g+", markersize=20, mew=6)

plt.plot(y, your_model2017, label="Your prediction")
plt.plot(y, Moore_Model2017, label="Moores law")
plt.ylabel("# of transistors\nper microprocessor")
plt.legend()

19200000000.0 250000000.0 7050000000.0

结果是您的模型接近平均值，但戈登·摩尔的预测更接近 2017 年生产的每种微处理器的最大晶体管数量。尽管半导体制造商认为增长会放缓，但分别在 1975 年和现在接近 2025 年，制造商仍在每两年生产几乎使晶体管数量翻倍的半导体。

线性回归模型更擅长预测平均值而不是极端值，因为它满足最小化 $\sum |y_i - A\cdot \text{year}[i]+B|^2$ .

最后一步是分享您的发现。您创建了代表线性回归模型和戈登·摩尔预测的新数组。您通过使用 np.loadtxt 将 CSV 文件导入 NumPy 数组来开始此过程，要保存您的模型，请使用两种方法：

np.savez：保存 NumPy 数组以供其他 Python 会话使用
np.savetxt：保存一个 CSV 文件，其中包含原始数据和您的预测数据

将数组压缩到一个文件中¶

使用 np.savez，您可以保存数千个数组并为它们命名。np.load 函数会将数组加载回工作区，作为字典。您将保存五个数组，以便下一个用户拥有年份、晶体管数量、预测的晶体管数量、戈登·摩尔的预测数量和拟合常数。再添加一个变量，供其他用户理解模型，即 notes。

notes = "the arrays in this file are the result of a linear regression model\n"
notes += "the arrays include\nyear: year of manufacture\n"
notes += "transistor_count: number of transistors reported by manufacturers in a given year\n"
notes += "transistor_count_predicted: linear regression model = exp({:.2f})*exp({:.2f}*year)\n".format(
    B, A
)
notes += "transistor_Moores_law: Moores law =exp({:.2f})*exp({:.2f}*year)\n".format(
    B_M, A_M
)
notes += "regression_csts: linear regression constants A and B for log(transistor_count)=A*year+B"
print(notes)

the arrays in this file are the result of a linear regression model
the arrays include
year: year of manufacture
transistor_count: number of transistors reported by manufacturers in a given year
transistor_count_predicted: linear regression model = exp(-666.33)*exp(0.34*year)
transistor_Moores_law: Moores law =exp(-675.38)*exp(0.35*year)
regression_csts: linear regression constants A and B for log(transistor_count)=A*year+B

np.savez(
    "mooreslaw_regression.npz",
    notes=notes,
    year=year,
    transistor_count=transistor_count,
    transistor_count_predicted=transistor_count_predicted,
    transistor_Moores_law=transistor_Moores_law,
    regression_csts=(A, B),
)

results = np.load("mooreslaw_regression.npz")

print(results["regression_csts"][1])

-666.3264063536233

! ls

_static				    tutorial-plotting-fractals.md
air-quality-data.csv		    tutorial-static_equilibrium.md
mooreslaw-tutorial.md		    tutorial-style-guide.md
mooreslaw_regression.npz	    tutorial-svd.md
save-load-arrays.md		    tutorial-x-ray-image-processing
transistor_data.csv		    tutorial-x-ray-image-processing.md
tutorial-air-quality-analysis.md    who_covid_19_sit_rep_time_series.csv
tutorial-deep-learning-on-mnist.md  x_y-squared.csv
tutorial-ma.md			    x_y-squared.npz
tutorial-plotting-fractals

np.savez 的好处是可以保存数百个具有不同形状和类型的数组。在这里，您保存了 4 个双精度浮点数数组，形状为 (179,)，一个文本数组，以及一个双精度浮点数数组，形状为 (2,)。这是保存 NumPy 数组供其他分析使用的首选方法。

创建您自己的逗号分隔值文件¶

如果您想共享数据并在表格中查看结果，则必须创建一个文本文件。使用 np.savetxt 保存数据。此函数比 np.savez 的功能更有限。分隔文件（如 CSV）需要二维数组。

通过创建一个包含感兴趣数据的列的新二维数组来准备要导出的数据。

使用 header 选项描述文件中的数据和列。将另一个包含文件信息的变量定义为 head。

head = "the columns in this file are the result of a linear regression model\n"
head += "the columns include\nyear: year of manufacture\n"
head += "transistor_count: number of transistors reported by manufacturers in a given year\n"
head += "transistor_count_predicted: linear regression model = exp({:.2f})*exp({:.2f}*year)\n".format(
    B, A
)
head += "transistor_Moores_law: Moores law =exp({:.2f})*exp({:.2f}*year)\n".format(
    B_M, A_M
)
head += "year:, transistor_count:, transistor_count_predicted:, transistor_Moores_law:"
print(head)

the columns in this file are the result of a linear regression model
the columns include
year: year of manufacture
transistor_count: number of transistors reported by manufacturers in a given year
transistor_count_predicted: linear regression model = exp(-666.33)*exp(0.34*year)
transistor_Moores_law: Moores law =exp(-675.38)*exp(0.35*year)
year:, transistor_count:, transistor_count_predicted:, transistor_Moores_law:

构建一个单一的二维数组以导出为 CSV。表格数据本质上是二维的。您需要将数据组织成适合此二维结构。将 year、transistor_count、transistor_count_predicted 和 transistor_Moores_law 分别作为第一到第四列。将计算出的常数放在标题中，因为它们不适合 (179,) 的形状。np.block 函数将数组连接在一起以创建一个新的、更大的数组。使用 np.newaxis 将 1D 向量排列为列，例如：

>>> year.shape
(179,)
>>> year[:,np.newaxis].shape
(179,1)

output = np.block(
    [
        year[:, np.newaxis],
        transistor_count[:, np.newaxis],
        transistor_count_predicted[:, np.newaxis],
        transistor_Moores_law[:, np.newaxis],
    ]
)

使用 np.savetxt 创建 mooreslaw_regression.csv，使用三个选项创建所需的导出格式：

X = output：使用 output 块将数据写入文件
delimiter = ','：使用逗号分隔文件中的列
header = head：使用上面定义的标题 head

np.savetxt("mooreslaw_regression.csv", X=output, delimiter=",", header=head)

! head mooreslaw_regression.csv

# the columns in this file are the result of a linear regression model
# the columns include
# year: year of manufacture
# transistor_count: number of transistors reported by manufacturers in a given year
# transistor_count_predicted: linear regression model = exp(-666.33)*exp(0.34*year)
# transistor_Moores_law: Moores law =exp(-675.38)*exp(0.35*year)
# year:, transistor_count:, transistor_count_predicted:, transistor_Moores_law:
1.971000000000000000e+03,2.250000000000000000e+03,1.130514785642591733e+03,2.249999999999916326e+03
1.972000000000000000e+03,3.500000000000000000e+03,1.590908400344571419e+03,3.181980515339620069e+03
1.973000000000000000e+03,2.500000000000000000e+03,2.238793840142739555e+03,4.500000000000097316e+03

总结¶

总而言之，您已将半导体制造商的历史数据与摩尔定律进行了比较，并创建了一个线性回归模型，以找到每两年添加到每款微处理器中的平均晶体管数量。戈登·摩尔预测从 1965 年到 1975 年，晶体管数量将每两年翻一番，但从 1971 年到 2019 年，平均增长率保持了每两年 $\times 1.98 \pm 0.01$ 的稳定增长。2015 年，摩尔修改了他的预测，认为摩尔定律应在 2025 年前有效。[2]。您可以将这些结果共享为压缩 NumPy 数组文件 mooreslaw_regression.npz，或作为另一个 CSV 文件 mooreslaw_regression.csv。半导体制造业的惊人进步催生了新的行业和计算能力。这项分析应该能让您对过去半个世纪以来这种增长的惊人程度有一个小小的认识。