0
点赞
收藏
分享

微信扫一扫

python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)

一、原理与分析

1、目标页面

https://item.jd.com/6515029.html

    在chrome中打开,按f12键进入开发者模式,找到商品详情数据接口,如下:

python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)_数据采集接口

2、URL链接:

https://api.m.jd.com/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1692499380806&body=%7B%22skuId%22%3A6515029%2C%22cat%22%3A%221316%2C1381%2C1391%22%2C%22area%22%3A%2225_2258_0_0%22%2C%22shopId%22%3A%221000099941%22%2C%22venderId%22%3A1000099941%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppp1pppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%7D&h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e&x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX&loginType=3&uuid=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14

3、标头:


:authority:
api.m.jd.com
:method:
GET
:path:
/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1692499380806&body=%7B%22skuId%22%3A6515029%2C%22cat%22%3A%221316%2C1381%2C1391%22%2C%22area%22%3A%2225_2258_0_0%22%2C%22shopId%22%3A%221000099941%22%2C%22venderId%22%3A1000099941%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppp1pppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%7D&h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e&x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX&loginType=3&uuid=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14
:scheme:
https
Accept:
application/json, text/javascript, */*; q=0.01
Accept-Encoding:
gzip, deflate, br
Accept-Language:
zh-CN,zh;q=0.9
Cookie:
shshshfpa=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; shshshfpx=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; __jdc=122270672; __jdu=16893052418291576334291; mba_muid=16893052418291576334291; wlfstk_smdl=4qftb0r6lu47t0sx6ovvi37no1pu4y49; 3AB9D23F7A4B3C9B=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; retina=0; appCode=msc588d6d5; webp=1; visitkey=8718662230147716920; sc_width=1536; wxa_level=1; cid=9; jxsid=16924405174098442434; __jdv=122270672%7Cdirect%7C-%7Cnone%7C-%7C1692440521537; equipmentId=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; fingerprint=ba1afe80c24e71237978e1b005ec6a48; deviceVersion=115.0.0.0; deviceOS=; deviceOSVersion=; deviceName=Chrome; warehistory="10072773656365,10072773656365,10072773656365,10072773656365,"; autoOpenApp_downCloseDate_autoOpenApp_autoPromptly=1692441025259_1; __wga=1692441027033.1692440547180.1691914712301.1691914712301.4.2; PPRD_P=UUID.16893052418291576334291-LOGID.1692441027044.644926152; __jd_ref_cls=MProductdetail_CouponFloorExpo; jsavif=1; __jda=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14; token=a4d78cd04f402b3f7ad6a29e8af8aa6f,2,940277; __tk=krazkYhsAcgzjrhtAuewjueDjufpArg5BVoz4zttAzG,2,940277; 3AB9D23F7A4B3CSS=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX; _gia_d=1; __jdb=122270672.2.16893052418291576334291|14.1692498368; shshshfpb=xbVnfPmoZnca-0u5O8YJzHQ; areaId=25; ipLoc-djd=25-2258-0-0
Origin:
https://item.jd.com
Referer:
https://item.jd.com/
Sec-Ch-Ua:
"Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"
Sec-Ch-Ua-Mobile:
?0
Sec-Ch-Ua-Platform:
"Windows"
Sec-Fetch-Dest:
empty
Sec-Fetch-Mode:
cors
Sec-Fetch-Site:
same-site
User-Agent:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
X-Referer-Page:
https://item.jd.com/6515029.html
X-Rp-Client:
h5_1.0.0

4、接口返回数据:

其中包括:商品图片地址,商品价格,标题,等信息,正是我们所需要的。

(数据量太大,截了一小部分)

{
    "extendWarrantyInfo": {
        "descUrl": "https://baozhang.jd.com/static/serviceDesc",
        "detailUrl": "https://b.jr.jd.com/service/serveIntroduce/#/introduce3?mainSkuId={mainSkuId}&brandId={brandId}&thirdCategoryId={cid3}&bindSkuId={bindSku}",
        "serviceItems": [
            {

5、数据分析

(1)body参数


经过分析发现,URL里body包含请求参数详情,body经过了url编码,解码后如下:

{"skuId":6515029,"cat":"1316,1381,1391","area":"25_2258_0_0","shopId":"1000099941","venderId":1000099941,"paramJson":"{\"platform2\":\"1\",\"specialAttrStr\":\"p0ppppppppppp1pppppppppppp\",\"skuMarkStr\":\"00\"}","num":1,"bbTraffic":""}

"skuId":6515029为商品编号;"shopId":"1000099941"为店铺编号;其它参数跟浏览器等硬件环境有关,可固定不变。


(2)appid参数

指示接口类别,数据值如下:

appid=pc-item-soa  pc端数据详情;

appid=item-v3         数据版本v3;


(3)functionId参数

指示该接口的功能:

functionId=pc_detailpage_wareBusiness   pc端商品页面详情

functionId=pc_club_productCommentSummaries         pc端评论接口数据

functionId=recDivinerApi                          商品页有关数据

functionId=pctradesoa_getprice               返回价格信息


functionId参数不同,body里面的具体参数也不一样。


(4)x-api-eid-token参数

x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX

经测试,该 参数并不会被服务器校验,因此可忽略,不影响数据采集。


(5)h5st参数(数据签名)

h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e

h5st是京东数据签名参数,每个接口都需要。只有签名正确,服务器才会返回数据。不然就会出现多次请求偶尔返回一次数据的情况。

所以,要想采集到数据,必须得到h5st正确的签名。下面具体分析h5st的签名过程:


二、h5st签名分析

1、查找h5st签名算法的位置

全局搜索:getDataColor,为什么要搜索getDataColor,因为h5st算法就在这个函数的附近。

设下断点,刷新页面,截图如下:

python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)_数据采集接口_02

可以直观的看到具体签名过程如下:

            try {
                var d = JSON.parse(JSON.stringify(r));
                d.body = SHA256(s).toString(),
                window.PSign.sign(d).then(function(e) {
                    r.h5st = encodeURI(e.h5st);
              //......................
              }

签名语句:window.PSign.sign(d);

然后返回: r.h5st = encodeURI(e.h5st);

是一个异步过程。

2、下面具体分析各个签名参数:

(1)body参数

{"skuId":6515029,"cat":"1316,1381,1391","area":"25_2258_2261_6568","shopId":"1000099941","venderId":1000099941,"paramJson":"{\"platform2\":\"1\",\"specialAttrStr\":\"p0ppppppppppp1pppppppppppp\",\"skuMarkStr\":\"00\"}","num":1,"bbTraffic":""}

(2)d参数:

{
    "appid": "pc-item-soa",
    "functionId": "pc_detailpage_wareBusiness",
    "client": "pc",
    "clientVersion": "1.0.0",
    "t": 1692498783586,
    "body": "dddd48059b91f87eb42b080167bd70b5303b3df8c4b71a3967372fcda60cd496"
}

d.body = SHA256(s).toString()  

按f11单步跟进,发现SHA256的位置。抠下来:

python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)_数据采集接口_03

(SHA256算法)


(3)t参数

t:a

a = (new Date).getTime()

t参数是一个时间戳。


签名参数分析完了,下面寻找h5st签名算法。


3、h5st签名算法

在window.PSign.sign(d)处下断点,按f11键单步进入:

python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)_签名算法_04

进入h5st签名的js文件后,把该签名文件整个保存下来。该js文件名为:js_security_v3_0.1.4.js

(js_security_v3_0.1.4.js内容)

4、h5st签名返回字符串:

{
    "appid": "pc-item-soa",
    "functionId": "pc_detailpage_wareBusiness",
    "client": "pc",
    "clientVersion": "1.0.0",
    "t": 1692498783586,
    "body": "dddd48059b91f87eb42b080167bd70b5303b3df8c4b71a3967372fcda60cd496",
    "_stk": "appid,body,client,clientVersion,functionId,t",
    "_ste": 1,
    "h5st": "20230820131419818;9m99mz6itng955u3;fb5df;tk03w9d441cbf18nk990HQLMH0ehQyR5j8EBXtSrYlGtY8KzYUkKCoUctg6u1pqtBeAqYw-t1yFcromGuN17RlgILtyk;65001318ffed0d17ee21652afb01a996;4.1;1692508459818;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d627d6db4fde5974622b566cdace3d88a8999574369ad4a27c752e256a8a6d92a5fdfa8633dae1aa5d17f9ea6a859ed6b22c920d7881227b2f7f61f3bbf82c17afd340c42be154e8e3ad1d39c2d8ba94acb84c25299080b5545acc894168647303ed"
}

其中的h5st字段是我们所需要的。


三、在python等其它语言中调用签名接口

js_security_v3_0.1.4.js是具体的签名文件,但还不能在python中直接调用,会报缺少window的错误,因此需要补环境。

技术支持:复制:byc6352

下面的python代码是调用签名及请求接口(环境已补):


# -*- coding: UTF-8 -*-
import requests,json
import pkgutil
import time
from urllib.parse import urlparse, parse_qs, urlunparse
import hashlib
import execjs
from urllib.parse import quote
import io
import sys

def savetofile(text,filename):
    file = open(filename, "w",encoding='utf-8' )
    file.write(text)
    file.close()

def print_hi(name):
    # Use a breakpoint in the code line below to debug your script.
    print(f'Hi, {name}')  # Press Ctrl+F8 to toggle the breakpoint.

def jd(skuid):
    appid='item-v3'
    functionId='recDivinerApi'
    body={"lid":27,"lim":15,"ec":"utf-8","uuid":"16900368971511636315768","pin":"","p":902029,"sku":skuid,"ck":"pin,ipLocation,atw,aview","c1":1316,"c2":1387,"c3":11932,"securityToken":"iJJJBrR7BAxWWavOluQxmMQ","clientChannel":"3","clientPageId":"item.jd.com"}
    js_file = open("h5st.js", "r", encoding='utf-8')
    js=js_file.read()
    exc = execjs.compile(js)
    url= exc.call("sign", appid,functionId,body)
    print('url='+url)
    headers={
        "Authority": "api.m.jd.com",
        "Accept": "application / json, text / javascript, * / *; q = 0.01",
        "Accept - Encoding": "gzip, deflate, br",
        "Accept - Language": "zh - CN, zh;q = 0.9",
        "Cookie": "shshshfpb=i0ZU6VlHi9tt1RukWDDyR0w; 3AB9D23F7A4B3C9B=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; shshshfpa=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; shshshfpx=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; __jdc=122270672; __jdv=122270672|direct|-|none|-|1689305241830; __jdu=16893052418291576334291; areaId=25; ipLoc-djd=25-2258-2261-6568; token=7a3a5010c8ea7250057d9168270daacd,2,939221; __tk=be32047e11adf495830ad564f7c34cd6,2,939221; 3AB9D23F7A4B3CSS=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTRiDY9CRQSU93J9SUTiPmFy3PTP7N8itsNd7DLuiPzfoEjAAACXCBKUWUQMP7FMX; _gia_d=1; jsavif=1; __jda=122270672.16893052418291576334291.1689305242.1690550636.1690599310.7; __jdb=122270672.1.16893052418291576334291|7.1690599310",
        "Origin": "https://item.jd.com",
        "Referer": "https://item.jd.com/",
        "Sec-Ch-Ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\"",
        "Sec-Ch-Ua-Mobile":"?0",
        "Sec-Ch-Ua-Platform":"\"Windows\"",
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-site",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
        "X-Referer-Page": f"https://item.jd.com/{skuid}.html",
        "X-Rp-Client": "h5_1.0.0",
    }
    res=requests.get(url=url, headers=headers)
    print(res)
    text=res.text
    savetofile(text,"sku.txt")
    print(text)
    return text

# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    print_hi('最新4.1版本h5st签名返回商品详情。技术支持:byc6352')
    jd(100019322424)

四、在python中成功返回商品详情信息

python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)_javascript_05

大功造成!

举报

相关推荐

京东工业商品详情数据接口

0 条评论