Crowdsourcing

Outsourcing some tasks to a crowd -> Crowdsourcing
Improve the quality, timeliness and breadth of data
将一些任务外包给人群 -> Crowdsourcing
提高数据的质量、及时性和广度

Key questions:

What computational problems can/should be solved?
Data augmenting, Data processing
What are the programming paradigms/platforms?
A programming paradigm is the classification, style or way of programming. It is an approach to solve problems by using programming languages.
How do we guarantee that the solution is accurate, efficient and economical?
Quality, cost and latency
How do we motivate participation and leverages their unique expertise and interests of workers?
How do we leverage the joint efforts of both automated and
human computers as workers?

3 central aspects of crowdsourcing

What
- What tasks can be performed by machines
- Decompose the macro and micro tasks
Who
- Expertise of workers （如何模拟工人的专业知识）
- Manage cultural aspects and language barrier
How
- How to design and execute tasks
- Aggregate noisy & complex output ( defines how intelligent aggregation techniques should be, such as Hierarchical—cluster-based aggregation) 聚合嘈杂和复杂的输出（定义智能聚合技术应该如何，例如分层 - 基于集群的聚合）

使用Parallel安排worker
- Operations & Control：多产线并行，成本高
- Cost vs latency：cost high, low latency 成本高，延迟小
使用sequential安排worker
- Operations & Control：一个接一个
- Cost vs latency：延迟高，需要等上一个工人的结果，但如果计划分配三名工人，如果他们中的两个同意结果，那么不需要执行另一个 HIT，节约成本
Operations & Control：
- Repetition
  You repeat the tasks until you are satisfied
  重复任务直到满意
- Selection
  You retrieve tasks using selection mechanisms
  使用选择机制检索任务

Challenges

Goal

Assumptions

挑战

目标

假设

Latent Class models
在这里插入图片描述
crowdsourcing

在这里插入图片描述

Investment factors

Liquidity principle: financial assets held in rapid cash ability
Safety principle: the value of the financial asset and and bear ability due to the loss of accident risk
Profit principle: a financial asset investment income level
流动性原则：持有的金融资产具有快速变现的能力
安全原则：金融资产的价值和因事故风险损失而产生的承受能力
盈利原则：金融资产投资收益水平

MPT 用于选择投资以在可接受的风险水平内最大化其整体回报

利用不同的收益集（盘中、收盘和调整后收盘）和相关性（在一个行业内和与其他市场）来预测未来收益

投资者可以根据对风险承受能力的评估选择两者的最佳组合，从而获得最佳结果。这种最佳组合构成了有效边界，它是 MPT 的基石，也是指示投资组合的基本线，这些投资组合将提供以最低的风险获得最高的回报。

EMH是金融经济学中的一个假设，它指出资产价格反映了所有可用信息。 EMH 指出全球金融市场在信息上是有效的，这意味着股票价格反映了与目标公司相关的所有信息

社会媒体有对股票的讨论和信息

Stock-net是一种深度学习解决方案，具有 3 层架构，基础层是市场信息编码器，用于对推文和股票价格数据进行编码。该模型试图根据推文学习股票走势，使用基于事件的情绪分析进行股票预测。

更集中地使用tweeter数据

trading day data of a stock
- Basic: Date, Open price, Close price, High, Low, Adjusted close, Volume,
  日期，开盘价格，收盘价格，当日最高价，最低价，修正收盘价（考虑任何公司行为后的修正收盘价），交易量（交易日交易的股票数量的价值）
- More: Twitter Data for stocks
  推特股票数据
Cleaning the data
清洗推特数据留下text
Data processing
- 特殊符号处理，时间统一
- 按照交易日合并股票价格和tweet text
  - 用开盘价需要假设推文可能来自一天中的任何时间
  - 收盘价更容易了解趋势，并有助于确定推文是否对股票有任何影响
Trend representation
- 收盘价和开盘价作差，正向trend标记1，负向trend标记0
Normalization dataset

模型用tweet text来预测 trend，时间作为index

使用LSTM/BiLSTM
BERT model
BERT 代表来自 Transformers 的双向编码器表示，它基于 Transformers，这是一种深度学习模型，其中每个输出元素都连接到每个输入元素，并且它们之间的权重是根据它们的连接动态计算的
dense neural network
Distilled BERT