摘要
随着后疫情时代的到来,出行游玩的需求不断增大。而在共享经济、全域旅行、旅游+的浪潮不断涌现,跨界合作的增多,市场大环境下的住宿业正面临着新的机遇和挑战。在新的时代背景和高新科技的助力下,以民宿、短租为主的非标住宿异常突起。Airbnb 作为非标住宿首秀,以互联网为依托,在不断扩大用户市场的同时,通过数据分析准确定位用户预订目标,以此掌握粘性客源。
旅行目的地预测可以帮助民宿平台系统在用户注册后未预订前为用户推荐对应目的地国家的民宿或短租服务,提高住宿业的流量吸纳能力,同时提高用户的软件使用体验。本文利用 kaggle 网站上 Airbnb 民宿的用户注册数据,通过对已注册用户的预订行为进行研究,对新用户首次预订目的地国家进行预测。通过数据预处理对数据集进行调试和修改,以图表的形式直观地体现影响用户预订目的地的变量指标。通过特征工程选择有效的特征数据,并通过适当的训练集测试集拆分,得到最终可用的数据集。通过把有效目的地做三分类处理:EU countries, Pacific countries, other,用来提高预测性能。 之后将h会运用多种模型进行测试集预测,并从单一模型和模型融合两个角度出发,对于数据预测进行处理和说明。
关键词:分类预测、模型融合、多分类问题、旅行目的地预测、Boosting算法
Abstract: With the advent of the post-epidemic era, the demand for travel and play continues to increase. With the continuous emergence of the sharing economy, global travel, and tourism+, and the increase in cross-border cooperation, the accommodation industry in the market environment is facing new opportunities and challenges. Under the background of the new era and the help of high-tech, non-standard accommodation, mainly homestays and short-term rentals, has risen sharply. Airbnb is the first show of non-standard accommodation, relying on the Internet, while continuously expanding the user market, it accurately locates the user's booking target through data analysis, so as to grasp the sticky customer source.
Travel destination prediction can help the homestay platform system recommend homestays or short-term rental services corresponding to the destination country for users after registration and before booking, improve the traffic absorption capacity of the accommodation industry, and improve the user's software experience. This paper uses the user registration data of Airbnb homestays on the kaggle website to predict the destination country for the first booking of new users by studying the booking behavior of registered users. The data set is debugged and modified through data preprocessing, and the variable indicators that affect the user's booking destination are visually reflected in the form of charts. Select effective feature data through feature engineering, and split it through appropriate training set and test set to get the final usable dataset. By classifying the valid destinations into three categories: EU countries, Pacific countries, other, it is used to improve the prediction performance. Afterwards, multiple models will be used to predict the test set, and the data prediction will be processed and explained from the perspective of a single model and model fusion.
Key words: Classification prediction, model fusion, multi-classification problem, travel destination prediction, Boosting algorithm
目录
2.2.1填补first_affiliate_tracked. 7
2.2.4 KNNImputer方法进行缺失值填补... 9
1.引言
随着后疫情时代的到来,出行游玩的需求不断增大。而旅行离不开住宿交通问题,这其中有人会选择酒店,有人则会选择民宿。在运营商的视角下看,正确识别顾客目的地预测,能够有效的调动当地住宿资源,获取更大的回报。
1.1 问题描述
本文借助于已有的数据,从民宿经营管理的角度出发,在了解平台注册及预订的用户特征的基础上,希望通过分析已有的数据特征,以及构建新的数据变量,通过识别客户的目的地预测,以实现优化平台业务决策的效果。
1.2 研究背景和意义
在全球化的大背景下,疫情刺激过后,大众旅游及旅行频率不断增加,与此同时对于住宿的需求也有所增加。随着消费者对于住宿需求越来越多样化,一定程度上会影响到市场变化,从而会催生出非传统化的产品(民宿)。而在共享经济的推动下,这类非传统化产品会越来越普遍,新的旅游业态随之而来。
创造收益是运营商平台备受关注的任务,而获取客源是创造收益的首要基础,除此之外维持长期收益就需要获取客户粘性。那么如何扩大吸引力,提升质量留存客户?运营商平台首要做的便是精准营销,定向推送。这反映在用户角度便是提升用户的浏览体验与使用体验。良好的体验有利于用户作为旅行的决定,这是一种良性循环。
1.3研究方法
由于本文采用的数据集在目的地国家这一特征指标上分布不平均[8],大部分国家数据相对较少,不太方便预测,因此正对论文数据集采取多分类方式进行处理。在目的地国家预测方面,考虑到不平衡数据,会优先考虑调整模型参数的方式进行优化。比如说在调用KNN模型的时候,会采取不同的neighbors进行模型调试。