一个脚本解决CSDN远程图片转存失败的问题

Posted by mkdir700 on 06-17,2021

前言

平时写文章用的是Typroa+PicGo,我贴的所有图片都是保存在oss上的,当我把带有远程图片地址的markdown粘贴到csdn的编辑器中时,往往存在一个问题,总有一部分图片会转存失败,但是我的oss 没有设置防盗链。

出现上诉情况,我每次都必须手动去查看哪些图片失效,并且重新上传。(或者在编辑器中重新多次粘贴markdown

于是我想,写一个脚本,提前将所有的图片外链上传至CSDN服务器。即便是后续重复上传,因为后端会校验文件的MD5 ,所以就会直接返回链接给我了。

本文中的脚本使用Python编写,建议使用JS写个油猴脚本更加实用,如果我有时间可以再写个JS脚本,另一方面是因为JS不是擅长项。

接口分析

接口地址:https://bizapi.csdn.net/blog-console-api/v3/image/transfer

打开审查工具看了看,主要分为两个部分,签名和参数

接口签名

签名在请求头中,如下图所示。

这个接口比较简单,稍稍分析就可以出来了,CSDN在这一块也没有做什么JS混淆、加密之类,还是比较耿直的。

对于这些,参数名,直接Ctrl+Shift+F全局搜索就找到。

X-Ca-Key

X-Ca-Nonce

随机生成一个UUID

xCaNonce(_xCaNonce) {
  var nonce = _xCaNonce || null;
  if (nonce == null) {
    nonce = createUuid();
  }
  return nonce;
}

X-Ca-Signature

计算签名

var xCaSignature = function xCaSignature(_ref) {
  var meth = _ref.meth,
      url = _ref.url,
      appSecret = _ref.appSecret,
      accept = _ref.accept,
      date = _ref.date,
      contentType = _ref.contentType,
      params = _ref.params,
      headers = _ref.headers;

  var textToSign = '';

  if (!params && url.indexOf('?') !== -1) {
    params = getParams(url);
    url = url.split('?')[0];
  } else if (!params) {
    params = {};
  }
  var md5 = '';
  textToSign += meth + '\n';
  textToSign += accept + '\n';
  textToSign += md5 + '\n';
  textToSign += contentType + '\n';
  textToSign += date + '\n';
  var signatureHeaders = headersToSign(headers);

  var sortedHeaderNames = __WEBPACK_IMPORTED_MODULE_2_babel_runtime_core_js_array_from___default()(__WEBPACK_IMPORTED_MODULE_1_babel_runtime_core_js_object_keys___default()(signatureHeaders)).sort();
  var _iteratorNormalCompletion2 = true;
  var _didIteratorError2 = false;
  var _iteratorError2 = undefined;

  try {
    for (var _iterator2 = __WEBPACK_IMPORTED_MODULE_0_babel_runtime_core_js_get_iterator___default()(sortedHeaderNames), _step2; !(_iteratorNormalCompletion2 = (_step2 = _iterator2.next()).done); _iteratorNormalCompletion2 = true) {
      var headerName = _step2.value;

      textToSign += headerName + ':' + signatureHeaders[headerName] + '\n';
    }
  } catch (err) {
    _didIteratorError2 = true;
    _iteratorError2 = err;
  } finally {
    try {
      if (!_iteratorNormalCompletion2 && _iterator2.return) {
        _iterator2.return();
      }
    } finally {
      if (_didIteratorError2) {
        throw _iteratorError2;
      }
    }
  }

  var reg = /^(?=^.{3,255}$)(http(s)?:\/\/)?(www\.)?[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.csdn\.net)/;

  var path = url.replace(reg, '');
  textToSign = textToSign + urlToSign(path, params);

  var hash = __WEBPACK_IMPORTED_MODULE_3_crypto_js___default.a.HmacSHA256(textToSign, appSecret);
  var signature = hash.toString(__WEBPACK_IMPORTED_MODULE_3_crypto_js___default.a.enc.Base64);
  return signature;
};

上方代码看上去比较多,稍稍分析后,就可以理清其中的逻辑了,得出这个方法的大致流程如下:

  1. header中的关键信息拼接为字符串textToSign

  2. textToSignappSecretHmacSHA256签名算法

  3. 将签名转为base64编码

既然是已知的签名算法,那就可以用Python中的标准库快速实现。

from hashlib import sha256
import hmac
import base64

def x_ca_signature(data, app_secret):
    data = data.encode()
    app_secret = app_secret.encode();
    return base64.b64encode(hmac.new(appsecret, data, digestmod=sha256).digest()).decode()

接口参数

回过头来看接口所需哪些参数。

art_id: 117997265
url: "https://cdn.z2blog.com/utools/pic/1623920010550.png"
uuid: "img-io0w4WCL-1623920456798"

art_id:文章id,可选

url:图片远程链接,必填

uuid:uuid必填

其中uuid则需要我们找找看,简单点,直接全局搜索art_id,可以看到三个参数在这里赋值

uuid的值来源于posId,然后搜索posId

查看getMd5Str

getMd5Str: function getMd5Str(length) {
  var curArr = new Uint32Array(length);
  crypto.getRandomValues(curArr);
  return curArr.cl_map(function (value) {
    return alphabet[value % radix];
  }).join('');
},

其实就是获取随机的字符串,这个uuid 是为了保证文件名唯一。

这个在Python中简单实现为随机获取长度为8的字符串即可。

import hashlib
uuid = hashlib.md5().hexdigest()[:8] 

效果

https://cdn.z2blog.com/picgo/20210617195541.mp4

代码

# -*- coding: utf-8 -*-
"""
Created on 2021/6/17 19:09
---------
@summary: 
---------
@author: mkdir700
@email:  mkdir700@gmail.com
"""
import re

import json
import time
import uuid
import hashlib
from hashlib import sha256
import hmac
import base64

import click
import requests
from tqdm import tqdm

SESSION = requests.Session()
COOKIE = '填写自己的Cookie'


def make_headers(app_key, nonce, signature):
    return {
        "accept": "*/*",
        "accept-encoding": "gzip, deflate, br",
        "accept-language": "zh-CN,zh;q=0.9",
        "cache-control": "no-cache",
        "content-type": "application/json",
        "cookie": COOKIE,
        "origin": "https://editor.csdn.net",
        "pragma": "no-cache",
        "referer": "https://editor.csdn.net/",
        "sec-ch-ua-mobile": "?0",
        "sec-fetch-dest": "empty",
        "sec-fetch-mode": "cors",
        "sec-fetch-site": "same-site",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
        "x-ca-key": app_key,
        "x-ca-nonce": nonce,
        "x-ca-signature": signature,
        "x-ca-signature-headers": "x-ca-key,x-ca-nonce",
    }


def get_signature(data, app_secret):
    data = data.encode()
    app_secret = app_secret.encode()
    return base64.b64encode(hmac.new(app_secret, data, digestmod=sha256).digest()).decode()


def upload_remote_pic(url):
    nonce = str(uuid.uuid4())
    app_key = "203803574"
    app_secret = "9znpamsyl2c7cdrr9sas0le9vbc3r6ba"
    data = f"""POST
*/*

application/json

x-ca-key:{app_key}
x-ca-nonce:{nonce}
/blog-console-api/v3/image/transfer"""
    signature = get_signature(data, app_secret)
    SESSION.headers = make_headers(app_key, nonce, signature)
    
    data = {
        "url": url,
        'uuid': f"img-{hashlib.md5().hexdigest()[:8]}-{str(int(time.time() * 1000))}",
    }
    resp = SESSION.post("https://bizapi.csdn.net/blog-console-api/v3/image/transfer", data=json.dumps(data))
    # print(json.dumps(resp.json(), sort_keys=True, indent=4, separators=(',', ':')))
    return resp.status_code


@click.command()
@click.argument('file_path')
def run(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()
    
    results = re.findall(r"!\[.*\((http[^)]+)", content)
    for url in tqdm(results):
        upload_remote_pic(url)


if __name__ == '__main__':
    run()