Linux - 线程-CFANZ编程社区

传统的一些函数是，成功返回0，失败返回-1，并且对全局变量errno赋值以指示错误。
pthreads函数出错时不会设置全局变量errno（而大部分其他POSIX函数会这样做）。而是将错误代码通过返回值返回
pthreads同样也提供了线程内的errno变量，以支持其它使用errno的代码。对于pthreads函数的错误，建议通过返回值业判定，因为读取返回值要比读取线程内的errno变量的开销更小

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <pthread.h>

void* rout(void* arg)
{
	int i;
	for (; ; ) {
		printf("I'am thread 1\n");
		sleep(1);
	}
}
int main(void)
{
	pthread_t tid;
	int ret;
	if ((ret = pthread_create(&tid, NULL, rout, NULL)) != 0) 
	{
		fprintf(stderr, "pthread_create : %s\n", strerror(ret));
		exit(EXIT_FAILURE);
	}
	int i;
	for (; ; )
	{
		printf("I'am main thread\n");
		sleep(1);
	}
}

3.3 进程ID和线程ID

在Linux中，目前的线程实现是Native POSIX Thread Libaray,简称NPTL。在这种实现下，线程又被称为轻量级进程(Light Weighted Process),每一个用户态的线程，在内核中都对应一个调度实体，也拥有自己的进程描述符(task_struct结构体)。
没有线程之前，一个进程对应内核里的一个进程描述符，对应一个进程ID。但是引入线程概念之后，情况发生了变化，一个用户进程下管辖N个用户态线程，每个线程作为一个独立的调度实体在内核态都有自己的进程描述符，进程和内核的描述符一下子就变成了1：N关系，POSIX标准又要求进程内的所有线程调用 getpid函数时返回相同的进程ID，如何解决上述问题呢？
Linux内核引入了线程组的概念。

struct task_struct 
{
    ...
    pid_t pid;
    pid_t tgid;
    ...
    struct task_struct *group_leader;
    ...
    struct list_head thread_group;
    ...
};

多线程的进程，又被称为线程组，线程组内的每一个线程在内核之中都存在一个进程描述符（task_struct）与之对应。进程描述符结构体中的pid，表面上看对应的是进程ID，其实不然，它对应的是线程ID;进程描述符中的tgid，含义是Thread Group ID,该值对应的是用户层面的进程ID

ps命令中的-L选项，会显示如下信息：

可以看出上面 threadpool 进程是多线程的，进程ID为157397，进程内有3个线程，线程ID分别为157387 157414 157415。

Linux提供了gettid系统调用来返回其线程ID，可是glibc并没有将该系统调用封装起来，在开放接口来共程序员使用。如果确实需要获得线程ID，可以采用如下方法：

#include <sys/syscall.h> 
pid_t tid; tid = syscall(SYS_gettid);

从上面可以看出，threadpool 进程的ID为157397，下面有一个线程的ID也是157397，这不是巧合。线程组内的第一个线程，在用户态被称为主线程(main thread),在内核中被称为group leader，内核在创建第一个线程时，会将 线程组的ID的值设置成第一个线程的线程ID，group_leader指针则指向自身，既主线程的进程描述符。所以线程组内存在一个线程ID等于进程ID，而该线程即为线程组的主线程。

/* 线程组ID等于线程ID，group_leader指向自身 */
p->tgid = p->pid;
p->group_leader = p;
INIT_LIST_HEAD(&p->thread_group);

至于线程组其他线程的ID则由内核负责分配，其线程组ID总是和主线程的线程组ID一致，无论是主线程直接创建线程，还是创建出来的线程再次创建线程，都是这样。

if ( clone_flags & CLONE_THREAD )
p->tgid = current->tgid;
if ( clone_flags & CLONE_THREAD ) {
P->group_lead = current->group_leader;
list_add_tail_rcu(&p->thread_group, &p->group_leader->thread_group);
}

强调一点，线程和进程不一样，进程有父进程的概念，但在线程组里面，所有的线程都是对等关系。

3.4 线程ID及进程地址空间布局

pthread_ create函数会产生一个线程ID，存放在第一个参数指向的地址中。该线程ID和前面说的线程ID不是一回事。
前面讲的线程ID属于进程调度的范畴。因为线程是轻量级进程，是操作系统调度器的最小单位，所以需要一个数值来唯一表示该线程。
pthread_ create函数第一个参数指向一个虚拟内存单元，该内存单元的地址即为新创建线程的线程ID，属于 NPTL线程库的范畴。线程库的后续操作，就是根据该线程ID来操作线程的。
线程库NPTL提供了pthread_ self函数，可以获得线程自身的ID：
```
pthread_t pthread_self(void);
```
pthread_t到底是什么类型呢？取决于实现。对于Linux目前实现的NPTL实现而言，pthread_t类型的线程ID，本质就是一个进程地址空间上的一个地址。

3.5 线程终止

如果需要只终止某个线程而不终止整个进程,可以有三种方法:

pthread_exit函数

需要注意,pthread_exit或者return返回的指针所指向的内存单元必须是全局的或者是用malloc分配的,不能在线程函数的栈上分配,因为当其它线程得到这个返回指针时线程函数已经退出了。

pthread_cancel函数

3.6 线程等待

3.6.1 为什么需要线程等待？

调用该函数的线程将挂起等待,直到id为thread的线程终止。thread线程以不同的方法终止,通过pthread_join得到的终止状态是不同的，总结如下:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
void *thread1(void *arg)
{
    printf("thread 1 returning ... \n");
    int *p = (int *)malloc(sizeof(int));
    *p = 1;
    return (void *)p;
}
void *thread2(void *arg)
{
    printf("thread 2 exiting ...\n");
    int *p = (int *)malloc(sizeof(int));
    *p = 2;
    pthread_exit((void *)p);
}
void *thread3(void *arg)
{
    while (1)
    { //
        printf("thread 3 is running ...\n");
        sleep(1);
    }
    return NULL;
}
int main(void)
{
    pthread_t tid;
    void *ret;
    // thread 1 return
    pthread_create(&tid, NULL, thread1, NULL);
    pthread_join(tid, &ret);
    printf("thread return, thread id %lX, return code:%d\n", tid, *(int *)ret);
    free(ret);
    // thread 2 exit
    pthread_create(&tid, NULL, thread2, NULL);
    pthread_join(tid, &ret);
    printf("thread return, thread id %lX, return code:%d\n", tid, *(int *)ret);
    free(ret);
    // thread 3 cancel by other
    pthread_create(&tid, NULL, thread3, NULL);
    sleep(3);
    pthread_cancel(tid);
    pthread_join(tid, &ret);
    if (ret == PTHREAD_CANCELED)
        printf("thread return, thread id %lX, return code:PTHREAD_CANCELED\n", tid);
    else
        printf("thread return, thread id %lX, return code:NULL\n", tid);
}

3.7 分离线程

int pthread_detach(pthread_t thread);

可以是线程组内其他线程对目标线程进行分离，也可以是线程自己分离:

pthread_detach(pthread_self());

joinable和分离是冲突的，一个线程不能既是joinable又是分离的。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
void *thread_run(void *arg)
{
    pthread_detach(pthread_self());
    printf("%s\n", (char *)arg);
    return NULL;
}
int main(void)
{
    pthread_t tid;
    if (pthread_create(&tid, NULL, thread_run, (void *)"thread1 run...") != 0)
    {
        printf("create thread error\n");
        return 1;
    }
    int ret = 0;
    sleep(1); // 很重要，要让线程先分离，再等待
    if (pthread_join(tid, NULL) == 0)
    {
        printf("pthread wait success\n");
        ret = 0;
    }
    else
    {
        printf("pthread wait failed\n");
        ret = 1;
    }
    return ret;
}