数据安全智能风控学习资源-CFANZ编程社区

最近更新日期为：2022/2

创作不易，欢迎关注，点赞，收藏！

入门综述

为什么机器学习解决网络安全问题总是失败：谈谈特征空间

为什么机器学习解决网络安全问题总是失败：脆弱的系统工程

为什么机器学习解决网络安全问题总是失败：不合理的评估指标

为什么机器学习解决网络安全问题总是失败：机器学习不是万能灵药

有用的无用模型：网络安全中复杂问题的建模方法

AI应用防御篇：用AI来做应用安全防护

OWASP10

https://salt.security/blog/whatistheowaspapisecuritytop10?

API风险发现系统：

网页链接

风险业务：

2021版重磅发布！银行业100个风险点及防控措施！

2021年黑灰产行业研究及趋势洞察报告

[杜跃进：数据安全治理的基本思路](杜跃进：数据安全治理的基本思路)

[api安全治理思路](聊聊API安全的重要性及治理思路)

[数据安全复合治理与实践白皮书]

BlackHat 上一些有意思的web攻防演讲：

https://www.blackhat.com/docs/asia17/materials/asia17DongBeyondTheBlacklistsDetectingMaliciousURLThroughMachineLearning.pdf

https://i.blackhat.com/briefings/asia/2018/asia18SimakovMarinaBreakingTheAttackGraph.pdf

https://i.blackhat.com/asia19/FriMarch29/bhasiaPhamAutomatedRESTAPIEndpoint.pdf

https://i.blackhat.com/asia20/Friday/asia20HaoAttackingAndDefendingMachineLearningApplicationsOfPublicCloud.pdf

https://i.blackhat.com/eu19/Wednesday/eu19KettleHTTPDesyncAttacksRequestSmugglingReborn.pdf

https://www.blackhat.com/docs/us17/wednesday/us17GilWebCacheDeceptionAttack.pdf

https://www.blackhat.com/docs/us17/wednesday/us17BurnettIchthyologyPhishingAsASciencewp.pdf

https://i.blackhat.com/us18/ThuAugust9/us18KettlePracticalWebCachePoisoningRedefiningUnexploitable.pdf

https://i.blackhat.com/USA19/Wednesday/us19ValentaMonstersInTheMiddleboxesBuildingToolsForDetectingHTTPSInterception.pdf

https://i.blackhat.com/USA20/Wednesday/us20KettleWebCacheEntanglementNovelPathwaysToPoisoning.pdf

实用的 HTTP 标头走私：通过反向代理攻击 AWS

https://i.blackhat.com/USA20/Wednesday/us20KleinHTTPRequestSmugglingIn2020NewVariantsNewDefensesAndNewChallenges.pdf

https://i.blackhat.com/EU21/Wednesday/EU21ThatcherPracticalHTTPHeaderSmuggling.pdf

https://www.blackhat.com/docs/us15/materials/us15GavrichenkovBreakingHTTPSWithBGPHijackingwp.pdf

https://www.blackhat.com/docs/us16/materials/us16SivakornHTTPCookieHijackingInTheWildSecurityAndPrivacyImplicationswp.pdf

https://towardsdatascience.com/deeplearningforspecificinformationextractionfromunstructuredtexts12c5b9dceada

Not Found

恶意注册账户

《Unveiling Fake Accounts at the Time of Registration: An Unsupervised Approach》

《DeepScan: Exploiting Deep Learning for Malicious Account Detection in LocationBased Social Networks》

风浪：GEM：蚂蚁金服利用GCN针对恶意注册账户检测

marchine learning for UEBA

《AI2: Training a big data machine to defend》

《Big Data Security Challenges: An Overview and Application of User Behavior Analytics》

《Adaptive Intrusion Detection System via Online Learning》

《A multimodel approach to the detection of webbased attacks》

《McPAD : A Multiple Classifier System for Accurate Payloadbased Anomaly Detection》

《Using Generalization and Characterization Techniques in the Anomalybased Detection of Web Attacks》

《AnomalyBased Web Attack Detection: A Deep Learning Approach》

《A Big Data Analysis Framework for ModelBased Web User Behavior Analytics》

《Anomalous Payloadbased Network Intrusion Detection》

《Data mining for security at Google》

《User and Entity Behavior Analytics for Enterprise Security》

《A Comprehensive Approach to Intrusion Detection Alert Correlation》

《Trafc Anomaly Detection Using KMeans Clustering》

《Calculation of the Behavior Utility of a Network System: Conception and Principle》

《Spectrogram: A MixtureofMarkovChains Model for Anomaly Detection in Web Traffic》

《用户画像相关技术》

MLOPS

MLOps：构建生产机器学习系统的最佳实践

入侵检测

https://www.usenix.org/system/files/conference/usenixsecurity13/sec13paper_nelms.pdf

http://www.ccs.neu.edu/home/alina/papers/MADE.pdf

机器学习在互联网巨头公司实践

The Cloudflare Blog

恶意url检测

URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection

我的AI安全检测学习笔记（一） - 404 Not Found

《Compromised or AttackerOwned: A Large Scale Classification and Study of Hosting Domains of Malicious URLs》

DDOS

GitHub - aviraonepiece/machine_learning: 学习用机器学习解决网络安全问题的Demo

http://cdmd.cnki.com.cn/Article/CDMD900022007140546.htm

https://pdfs.semanticscholar.org/6363/b9f28a7e037abe626a2e88fac3393c04bfda.pdfDefending

僵尸网络检测

https://i.blackhat.com/asia20/Friday/asia20XuWinThe0DayRacingGameAgainstBotnetInPublicCloud.pdf

cdxy：DataCon2020 僵尸网络追踪第一题writeup

dga域名检测

DGA域名的今生前世：缘起、检测、与发展

DGA域名检测相关技术 - CyberSecurityBook - 博客园

风控

无监督算法在虎牙风控的探索实践

斗鱼风控算法体系建设

数美在风控领域的创新与实践

Web应用防火墙_WAF_网站防火墙_网站安全防护-阿里云

[杜中伟：贝壳黑灰产识别与溯源](网页链接)

[阿里妈妈“广告主套利”风控技术分享](网页链接)

机器人流量识别

The Cloudflare Blog

爬虫识别

爬虫中常见的问题，常见的反爬机制_eli的博客-CSDN博客_反爬虫机制

反爬虫机制实践中可能遇到的坑_yib0y的博客-CSDN博客

时间基线支撑安全风险发现

笔者准备自己写一个博客

[liao wenzhe整理的资料](https://github.com/LiaoWenzhe/AiopsLearningResources)

gwave：【时间序列】格兰杰因果是因果吗？

404

时间序列异常检测几篇论文解读_Liao_Wenzhe的博客-CSDN博客

图数据挖掘

《A Practical Approach to Constructing a Knowledge Graph for Cybersecurity》

《Developing an Ontology for Cyber Security Knowledge Graphs》

《Towards a Relation Extraction Framework for CyberSecurity Concepts》

图神经网络在支付风控中的应用

图在异常流量识别中的应用和演进

风浪：基于图的异常检测（一）：OddBall

风浪：基于图的异常检测（二）：LOCKINFER

风浪：基于图的异常检测（三）：GraphRAD

数据集

1、[Samples of Security Related Dats](知乎 - 安全中心)

2、[DARPA Intrusion Detection Data Sets](知乎 - 安全中心)

3、[Stratosphere IPS Data Sets](知乎 - 安全中心)

4、[Open Data Sets](知乎 - 安全中心)

5、[Data Capture from National Security Agency](知乎 - 安全中心)

6、[The ADFA Intrusion Detection Data Sets](知乎 - 安全中心)

7、[NSLKDD Data Sets](https://github.com/defcom17/NSL_KDD)

8、[Malicious URLs Data Sets](知乎 - 安全中心)

9、[MultiSource CyberSecurity Events](知乎 - 安全中心)

10、[Malware Training Sets: A machine learning dataset for everyone](知乎 - 安全中心)

11. [Collection of Security and Network Data Resources](Page not found · GitHub Pages)

12. Security Data Samples Repository

13. [Vulnbank_dataset](https://github.com/AnchoretY/AI_And_Web_Security_Library/tree/master/dataset/vulnbank_dataset). KDD大赛的一个竞赛项目，主要目的是使用机器学习得手段建立一个入侵检测器。其中的入侵行为主要包括：DDOS、密码暴力破解、缓冲区溢出、扫描等多种攻击行为。

优秀开源推荐

https://github.com/LiaoWenzhe

https://github.com/yzhao062/pyod

https://github.com/yzhao062/anomalydetectionresources

[网络安全中机器学习大合集](https://github.com/jivoi/awesomemlforcybersecurity/blob/master/README_ch.md)

[最终安全数据科学和机器学习指南](http://www.covert.io/thedefinitivesecuritydatascienceandmachinelearningguide/)

[Machine Learning for Cyber Security](https://github.com/wtsxDev/MachineLearningforCyberSecurity#datasets)

[404师傅的整理](https://github.com/404notf0und/AIforSecurityLearning)

[AwesomeAISecurity](https://github.com/RandomAdversary/AwesomeAISecurity)

[awesomemlforcybersecurity](https://github.com/jivoi/awesomemlforcybersecurity#datasets)

[The Definitive Security Data Science and Machine Learning Guide](Page not found · GitHub Pages)

https://github.com/0xMJ/AISecurityLearning

[乌云](https://wooyun.x10sec.org/search?keywords=aa&content_search_by=by_bugs)

思维方式：

[提出好的想法和方向](https://mp.weixin.qq.com/s/jajNXjNxfAvV7SmLnVUAQ)

[刘知远：好的研究想法从哪里来](zibuyu9：好的研究想法从哪里来)

[MIT人工智能实验室：如何做研究](MIT人工智能实验室：如何做研究_leon的修炼之路-CSDN博客)

实用工具

[ReadPaper论文阅读平台](网页链接)

arxiv

google scholar

[百度rasp安全检测工具](简介 - OpenRASP 官方文档 - 开源自适应安全产品)

优秀公众号

安全学术圈

阿里安全应急响应中心

腾讯安全应急响应中心

百度安全应急响应中心

dataFunTalk

freebuf

404 Not F0und

阿里妈妈技术

优秀书籍

《风控要略：互联网业务反欺诈之路》

《web安全之机器学习入门》

《web安全之深度学习实战》

《web安全之强化学习与Gan》

一些思考

1. 人工智能waf分析api安全的可行性？

waf（web application firewall）应用防火墙，主要追求性能，对一些计算复杂的人工智能算法有局限性，即性能与智能的制约导致串联线路的waf必然不如旁路风险分析的api风险检测产品。此时可以将主串联waf+旁路风险分析结合，打造应用安全风险分析闭合线路。

2. 如何用机器学习找到规则无法找到的高价值风险？

规则的缺点？1.固定。 2.依赖专家经验，经常要手动调整。 3.针对特定场景设定规则策略，无法检测未知。针对这三个缺点我们可以用机器学习一一击破。

1. 问题1的解决：可以用机器学习中的智能阈值解决，利用大数据+机器学习学出阈值。

2. 问题2的解决：与问题1的解决思路类似。

3. 问题3的解决：制作未知风险检测模型，算法给出告警，然后利用运营反哺算法，并对检测出的特定风险制作定制模型。即未知风险检测+定制风险检测相结合的思路进行风险检测。

3. 怎么给风险分类？并对不同类别的风险采用未知风险检测+定制风险检测相结合的思路进行风险检测？

1. 参数异常风险：即攻击者的参数与正常访问参数不一致，检测异常参数。

2. 行为异常风险：即利用攻击者的行为与正常者的行为不一致，检测异常行为。

3. 传统攻击检测：例如c&c通信dga域名，僵尸网络检测，钓鱼邮件等传统问题，利用机器学习能取得更好的效果。

4. 如何在没有前人经验的情况下探索数据安全+机器学习的落地？

1. 深入理解业务，最好有专门的威胁情报收集团队，然后将自己当作一个攻击者，自己去尝试攻击。

2. 组合法：将问题拆解为各个小问题，逐个击破，例如将模型分类为爬虫，传统攻击。将业务分为政府，互联网等。

3. 类比法：从推荐系统，aiops中寻找新的思路。例如笔者之前就看到推荐系统中的僵尸粉检测，图算法中的边预测，可以尝试将其迁移学习到数据安全之中。

5. 异常行为分析（Abnormal behavior analysis）建模的一些思考

6. [数据安全流量风控体系建设](CSDN)

参考：

https://github.com/LiaoWenzhe/dataRisk-detection-resources

创作不易，有收获的朋友记得点赞，收藏，关注啊！

笔者组建了个大数据安全技术交流的群，群友遍布硅谷，新加坡，腾讯，阿里，浙大等等，欢迎志同道合的朋友与我联系加入！

数据安全智能风控学习资源