TinyRenderer

有很多渲染管线中的小点始终没有弄得太明白，所以决定跟练这个TinyRenderer的教程，希望自己跟着实现了这个光栅化渲染器之后能对相关的知识掌握得更为透彻。

Lesson 0 : getting started

这个教程采用了TGA格式文件作为输出，可以通过XnView打开，或者Photoshop，也就是所谓的离线渲染，跟之前做过的RayTracing in one weekend是一样的。

我就不讨论TGA文件是如何实现输出的了，这里有几个常用的操作：

//TGAColor为RGBA颜色
const TGAColor white = TGAColor(255, 255, 255, 255);

//TGAImage构造函数
TGAImage image(width, height, TGAImage::RGB || TGAImage::RGBA || TGAImage::GRAYSCALE);
//设置像素点颜色 x, y为int color为TGAColor
image.set(x, y, color);
//输出TGA文件
image.write_tga_file(string);

Lesson 1 : Bresenham’s Line Drawing Algorithm

第一课的目标：学会怎么绘制直线。我们使用的是Bresenham算法，我们当然可以上网搜索Bresenham算法的实现，但是为了更好地理解这个算法，我们决定自己来实现它，把所有的坑都踩一遍。

First Attempt

算法如下：

//最简单的直线算法 t可以理解成采样个数 如果循环次数最终小于max(x1 - x0, y1 - y0)，线段在视觉上就不会是连续的
void line1(int x0, int y0, int x1, int y1, TGAImage &image, TGAColor color) { 
        //某种程度上是一个lerp，只是将lerp写成了一个for循环（因为t是0到1）
	for (float t=0.; t<1.; t+=.01) { 
        int x = x0 + (x1-x0)*t; 
        int y = y0 + (y1-y0)*t; 
        image.set(x, y, color); 
    } 
}

//绘制一条从(13, 20)到(80, 40)的白色直线
line1(13, 20, 80, 40, image, white);

这个算法的输出：

Second Attempt

第一个算法的问题在于它的效率较低而且不够直观。我们可以看出t是从0到1，而t的取值将会控制绘制点的个数。我们目前只想要一条连续绘制的直线，所以我们可以用另一种方法写出算法1，算法如下：

//直线算法2 相比算法1更为直观
//依然是插值的思想，参数t落在[0, 1]，然后插值出y当前的值
void line2(int x0, int y0, int x1, int y1, TGAImage& image, TGAColor color) {
    for (int x = x0; x <= x1; x++) {
        float t = (x - x0) / (float)(x1 - x0);
        int y = y0 * (1. - t) + y1 * t;
        image.set(x, y, color);
    }
}

这样我们可以保证画出来的直线都是连续的…吗？

line2(13, 20, 80, 40, image, white); 
line2(20, 13, 40, 80, image, red); 
line2(80, 40, 13, 20, image, red);

我们尝试画出这三条直线，得到的结果是：

白线是连续的，红线离散化了，而本来应该覆盖掉白线的另一条红线则完全没有出现。为什么会这样？

Third Attempt

会出现上图这种现象的原因在于我们对情况考虑得不够周全。

首先，我们只考虑了x1-x0大于y1-y0的情况。如果出现了y差值比x差值大的情况，也就是红线的情况，虽然我们取到了每一个x值，但是直线在y方向上依然是离散的，因为y比较大；也就是说，我们需要判断x1-x0和y1-y0哪个比较大，哪个大，就将其作为循环的最大值。

其次，我们没有考虑x0/y0比x1/y1大的情况。这样就会导致上面的第三条直线没有得到任何绘制，我们需要判断x1/y1和x0/y0之间的大小关系。

为此我们提出第三种算法：

//直线算法2 但是是以y为循环最大值
void line2Transpose(int x0, int y0, int x1, int y1, TGAImage& image, TGAColor color) {
    for (int x = x0; x <= x1; x++) {
        float t = (x - x0) / (float)(x1 - x0);
        int y = y0 * (1. - t) + y1 * t;
        image.set(y, x, color);
    }
}

void line3(int x0, int y0, int x1, int y1, TGAImage& image, TGAColor color) {
    //判断xy之间大小 记得要加绝对值
    if (abs(x1 - x0) >= abs(y1 - y0)) {
        //判断x之间大小
        if (x1 >= x0) {
            line2(x0, y0, x1, y1, image, color);
        }
        else {
            line2(x1, y1, x0, y0, image, color);
        }

    }
    else {
        //判断y之间大小
        if (y1 >= y0) {
            line2Transpose(y0, x0, y1, x1, image, color);
        }
        else {
            line2Transpose(y1, x1, y0, x0, image, color);
        }
    }
}

增加了一个line2Transpose函数用于以y为循环的直线绘制。

得到的结果如下：

Timings : fourth attempt

事实上，上面的代码运行得非常好（原文：works great），可读性也很强，不过它并没有设置任何的边界检查，我们在这个教程中自己保证数据的可用性，不让他越界。

但是，如同我们上面所说过的，它效率很低。我们目前正在搭建一个离线渲染器，所以这或许不是太大的问题；我在使用一个桌面级CPU，所以暂时也看不出太大的问题。但是，如果要进行实时渲染呢？如果这是一个运行在手机上的渲染器呢？那么作为整个渲染器最为基础的直线绘制算法，效率当然是越高越好。所以我们现在要尝试来优化这个算法。

COPY一段原作者的调试结果：

%   cumulative   self              self     total 
 time   seconds   seconds    calls  ms/call  ms/call  name 
 69.16      2.95     2.95  3000000     0.00     0.00  line(int, int, int, int, TGAImage&, TGAColor) 
 19.46      3.78     0.83 204000000     0.00     0.00  TGAImage::set(int, int, TGAColor) 
  8.91      4.16     0.38 207000000     0.00     0.00  TGAColor::TGAColor(TGAColor const&) 
  1.64      4.23     0.07        2    35.04    35.04  TGAColor::TGAColor(unsigned char, unsigned char, unsigned char, unsigned char) 
  0.94      4.27     0.04                             TGAImage::get(int, int)

我们可以看到70%的时间都被用来调用line()函数！那么自然，line()就是我们优化的重点。

Fourth Attempt Continued

在这一部分中，作者给出了一份完全不同的代码（更接近Bresenham思想的代码）。原文中，作者将代码的改动归结为“take divisor out of the loop”，但在这里我卡了很久都没有明白为什么是这样。直到我看到了Github上的一个Issue：Interesting Conclusion · Issue #30 · ssloy/tinyrenderer (github.com)

根据Issue#30所说：

完全就是我的想法！

最后我采用了另一种方式来理解改动后的代码。首先贴上改动后的函数：

//float version of bresenham
void line4(int x0, int y0, int x1, int y1, TGAImage& image, TGAColor color) {
    int dx = x1 - x0;
    int dy = y1 - y0;
    float derror = abs(dy / (float)dx);
    float error = 0;
    int y = y0;
    for (int x = x0; x <= x1; x++) {
        if (steep) {
            image.set(y, x, color);
        }
        else
            image.set(x, y, color);
        error += derror;
        if (error > .5) {
            y += (y1 > y0 ? 1 : -1);
            error -= 1;
            //error = 0;
        }
    }
}

我们采用另一种想法来解读，来自[学习]bresenham算法绘制直线_哔哩哔哩_bilibili

在真实的直线中，每当x0 + 1，相应取到的真实的y值就会+ slope。那么，我们先从x0 + 1的情况开始理解，当x = x0 + 1时，y = slope**（假设x0 = y0 = 0）**；那么，我们如何判断我们应该使用y0还是y0 + 1？答案是我们使用了这两个点的中点位置来判断。

此时：真实的y值为1 * slope，y0和y0 + 1的中点为0.5

如果1 * slope > 0.5，说明真实的y值更靠近y0 + 1，那么下一个点就应该选择（x0 + 1, y0 + 1）
如果1 * slope < 0.5，说明真实的y值更靠近y0，那么下一个点选择（x0 + 1, y0）

也就是，我们可以利用斜率来估计下一个点的位置！

放到具体的代码中，斜率对应的就是所谓的derror这个量。

那为什么我们还需要一个error量？在我的理解中，这个量是为了累加斜率带来的影响，也就是我们不需要知道中间的情况，而只需一直累加斜率，就可以估计每一个点的y的取值。当error这个量大于0.5的时候（从x0 + 1的情况理解），y就应该加1，同时error自己要减去1，减1的操作是为了抵消y + 1所带来的影响，我们判断这个0.5是以y0为基准做判断的，那么当然要减去我们加上的y值，这个操作就是每次y + 1后error - 1.

Timings: fifth and final attempt

终于来到了final attempt。我们触及了Bresenham算法想要解决的问题的本质：去掉除法和浮点数运算。

很简单，斜率涉及到 $\frac{\Delta y}{\Delta x} $，那么就乘上$ \Delta x$；涉及到0.5，那么就乘上0.5。最终作者代码如下：

//只涉及第一象限斜率(0, 1)部分计算
int dx = x1-x0; 
int dy = y1-y0; 
int derror2 = std::abs(dy)*2; 
int error2 = 0; 
int y = y0; 
for (int x=x0; x<=x1; x++) { 
    if (steep) { 
        image.set(y, x, color); 
    } else { 
        image.set(x, y, color); 
    } 
    error2 += derror2; 
    if (error2 > dx) { 
        y += (y1>y0?1:-1); 
        error2 -= dx*2; 
    }

那么我写出的最终代码如下：

//standard bresenham algorithm
void line5(int x0, int y0, int x1, int y1, TGAImage& image, TGAColor color) {
    //optimization for line3() : add abs()
    int dx = abs(x1 - x0);
    int dy = abs(y1 - y0);
    int D = 2 * dy - dx;
    int y = y0;
    for (int x = x0; x <= x1; x++) {
        if (steep) {
            image.set(y, x, color);

        }
        else {
            image.set(x, y, color);
        }
        if (D > 0) {
            if (y1 > y0) {
                y++;
            }
            else {
                y--;
            }
            D -= 2 * dx;
        }
        D += 2 * dy;
    }
}

区别仅仅在于我从一开始就减去了dx。这是我理解的标准Bresenham算法，通过上面Third Attempt我实现的line3()函数实现在所有情况下的绘制。给出测试样例：

    line3(13, 20, 80, 40, image, white);
    line3(20, 13, 40, 80, image, red);
    line3(80, 40, 13, 20, image, red);
    line3(20, 80, 40, 40, image, red);

结果如下：

Wireframe rendering

有了绘制直线的Bresenham算法，我们可以立刻做一件可以获得成就感的事：渲染模型，不过是线框模式（Wireframe），即整个模型是由一条条直线构成的。

#pragma once

/**/


#ifndef __GEOMETRY_H__
#define __GEOMETRY_H__

#include <cmath>
#include <iostream>

///

//构建Vec2<>类
template <class t> struct Vec2 {
	union {
		struct { t u, v; };
		struct { t x, y; };
		t raw[2];
	};
	Vec2() : u(0), v(0) {}
	Vec2(t _u, t _v) : u(_u), v(_v) {}
	inline Vec2<t> operator +(const Vec2<t>& V) const { return Vec2<t>(u + V.u, v + V.v); }
	inline Vec2<t> operator -(const Vec2<t>& V) const { return Vec2<t>(u - V.u, v - V.v); }
	inline Vec2<t> operator *(float f)          const { return Vec2<t>(u * f, v * f); }
	template <class > friend std::ostream& operator<<(std::ostream& s, Vec2<t>& v);
};

//构建Vec3类
template <class t> struct Vec3 {
	union {
		struct { t x, y, z; };
		struct { t ivert, iuv, inorm; };
		t raw[3];
	};
	Vec3() : x(0), y(0), z(0) {}
	Vec3(t _x, t _y, t _z) : x(_x), y(_y), z(_z) {}
	inline Vec3<t> operator ^(const Vec3<t>& v) const { return Vec3<t>(y * v.z - z * v.y, z * v.x - x * v.z, x * v.y - y * v.x); }
	inline Vec3<t> operator +(const Vec3<t>& v) const { return Vec3<t>(x + v.x, y + v.y, z + v.z); }
	inline Vec3<t> operator -(const Vec3<t>& v) const { return Vec3<t>(x - v.x, y - v.y, z - v.z); }
	inline Vec3<t> operator *(float f)          const { return Vec3<t>(x * f, y * f, z * f); }
	inline t       operator *(const Vec3<t>& v) const { return x * v.x + y * v.y + z * v.z; }
	float norm() const { return std::sqrt(x * x + y * y + z * z); }
	Vec3<t>& normalize(t l = 1) { *this = (*this) * (l / norm()); return *this; }
	template <class > friend std::ostream& operator<<(std::ostream& s, Vec3<t>& v);
};


//typedef 别名
typedef Vec2<float> Vec2f;
typedef Vec2<int>   Vec2i;
typedef Vec3<float> Vec3f;
typedef Vec3<int>   Vec3i;


//重载输出流<<
template <class t> std::ostream& operator<<(std::ostream& s, Vec2<t>& v) {
	s << "(" << v.x << ", " << v.y << ")\n";
	return s;
}

template <class t> std::ostream& operator<<(std::ostream& s, Vec3<t>& v) {
	s << "(" << v.x << ", " << v.y << ", " << v.z << ")\n";
	return s;
}

#endif //__GEOMETRY_H__

然后作者提供了一个简易的model parser（model.h），用于这一节的模型读取（之后会继续拓展功能），但作者并没有给出model.h和model.cpp的工作原理解释。这里将源代码和我自己的理解贴上来。

model.cpp：

/* model.cpp
*/

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
#include "model.h"

/*istringstream的工作逻辑：
* 它首先获取一串字符串进行初始化  std::istringstream iss(line.c_str());
* 使用>>运算符从iss中读取字符 iss >> trash
* 每次遇到空格时，读取会自动停止
* 可以多次使用>>将不同的字符读取到不同的变量中  iss >> idx >> trash >> itrash >> trash >> itrash
* 自动识别什么字符该放到什么变量中 如trash存放char，对应斜杠；itrash和idx为int变量，对应数字
*/


Model::Model(const char* filename) : verts_(), faces_() {
    std::ifstream in;
    //打开相应文件 使用只读模式
    in.open(filename, std::ifstream::in);
    if (in.fail()) return;
    std::string line;
    while (!in.eof()) {
        //读取一行存到line中
        std::getline(in, line);
        std::istringstream iss(line.c_str());
        char trash; //trash字符，存放无用的v和f开头
        //compare()函数： 比较这一行从下标0开始的2个字符是不是"v "，也就是代表顶点
        //相同时返回0，所以取反
        if (!line.compare(0, 2, "v ")) {
            iss >> trash; //将v存放到trash中
            Vec3f v;
            for (int i = 0; i < 3; i++) iss >> v.raw[i]; //依次读取xyz值
            verts_.push_back(v); //添加到顶点数组中
        }
        else if (!line.compare(0, 2, "f ")) {
            std::vector<int> f;
            int itrash, idx;
            iss >> trash;
            //从iss中依次读取 
            //index（我们想要的顶点索引） /（没用的斜杠） texcoord（暂时不需要的纹理坐标，放在itrash）/（没用的斜杠） normal(暂时不需要的法向量)
            while (iss >> idx >> trash >> itrash >> trash >> itrash) {
                idx--; // in wavefront obj all indices start at 1, not zero
                f.push_back(idx); //组成由三个idx组成的vector<int> 之所以不直接用Vec3是因为索引有可能不止三个（图元是四边形）
            }
            faces_.push_back(f); //添加到面数组中
        }
    }
    std::cerr << "# v# " << verts_.size() << " f# " << faces_.size() << std::endl;
}

Model::~Model() {
}

int Model::nverts() {
    return (int)verts_.size();
}

int Model::nfaces() {
    return (int)faces_.size();
}

std::vector<int> Model::face(int idx) {
    return faces_[idx];
}

Vec3f Model::vert(int i) {
    return verts_[i];
}

model.h：

#pragma once
#ifndef __MODEL_H__
#define __MODEL_H__

#include <vector>
#include "geometry.h"

class Model {
private:
	std::vector<Vec3f> verts_;
	std::vector<std::vector<int> > faces_;
public:
	Model(const char* filename); //构造函数，也是解析.obj文件的主体函数
	~Model();
	int nverts(); //返回顶点数量
	int nfaces(); //返回三角形面数
	Vec3f vert(int i); //返回索引为i的顶点xyz数据，储存在Vec3f中
	std::vector<int> face(int idx); //返回索引为idx的三角形面数据，储存顶点的索引i
};

#endif //__MODEL_H__

有了以上三个文件的加持，我们终于有机会实现我们的绘制模型函数。obj文件为原作者提供的african_head.obj：

//画非洲老哥的头
void drawAfrican(Model* model, TGAImage& image, int width, int height) {
    for (int i = 0; i < model->nfaces(); i++) {
        //设置face = face(i)，即第i个面的索引
        std::vector<int> face = model->face(i);
        for (int j = 0; j < 3; j++) {
            Vec3f v0 = model->vert(face[j]); //获取第face[j]，即该索引的顶点坐标
            //获取第face[j + 1]的顶点坐标，之所以要%3是考虑到边界问题，也就是j = 2时，j + 1 = 3会产生数组越界，
            //用%3使其回到0来避免这种情况，同时也达到首尾相接的效果，组成三角形
            Vec3f v1 = model->vert(face[(j + 1) % 3]);
            //所有的* width / 2和 * height / 2 都是视口变换（Viewport Transformation）
            int x0 = (v0.x + 1.) * width / 2.;
            int y0 = (v0.y + 1.) * height / 2.;
            int x1 = (v1.x + 1.) * width / 2.;
            int y1 = (v1.y + 1.) * height / 2.;
            line3(x0, y0, x1, y1, image, white);
        }
    }
}

在main函数调用这个函数：

 //Wireframe模式绘制 African_head
    Model* model = new Model("obj/african.obj");
    drawAfrican(model, image, width, height);

到此，我们终于成功了…还没呢，至少在我的设备上，此时运行会报错：

fatal error LNK1107: 文件无效或损坏: 无法在 0x31270 处读取

在网上搜到的解决方法是，让obj文件“从生成中排除”，推测这应该跟我将obj文件直接加入了项目资源文件有关。不过，解决方法如下：

真的到此为止了。运行程序，打开输出的tga文件，我们就能看到我们想要的输出结果：

Lesson 1内容结束。

Tiny Renderer Lesson 0 & 1