最近寫爬蟲遇到一個(gè)網(wǎng)站帶有防止CSRF攻擊機(jī)制,該接口為POST請(qǐng)求,用PostMan測(cè)試后發(fā)現(xiàn)需要請(qǐng)求頭的Cookie和FormData里面的一個(gè)_token參數(shù)才能發(fā)起正確的請(qǐng)求,這兩個(gè)參數(shù)缺一不可而且有效時(shí)間只有一天,因?yàn)榕老x是做成定時(shí)任務(wù)的,每天在一個(gè)時(shí)間點(diǎn)都會(huì)自己運(yùn)行,這樣的話每天都要更換Cookie和Token太麻煩,摸索了一段時(shí)間后發(fā)現(xiàn)可以破解此網(wǎng)頁的CSRF:
第一步是Cookie會(huì)過期,所以先獲取Cookie:
首先請(qǐng)求該網(wǎng)頁的任意一個(gè)Get請(qǐng)求要能成功的,然后然后獲取響應(yīng)內(nèi)容的SetCookie,再將SetCookie放到POST請(qǐng)求中的CookieContainer中即可;
public static CookieContainer CookieContainer = new CookieContainer(); public static void GetCookieAndToken() { Encoding encoding = Encoding.UTF8; string requestUrlString = @"https://www.pproperties.com.hk/tc"; //向服務(wù)端請(qǐng)求 HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(requestUrlString); myRequest.ContentType = "application/x-www-form-urlencoded"; myRequest.CookieContainer = new CookieContainer(); //將請(qǐng)求的結(jié)果發(fā)送給客戶端(界面、應(yīng)用) HttpWebResponse myResponse = (HttpWebResponse)myRequest.GetResponse(); CookieContainer.Add(myResponse.Cookies); StreamReader reader = new StreamReader(myResponse.GetResponseStream(), encoding); var htmlStr = reader.ReadToEnd(); HtmlDocument html = new HtmlDocument(); html.LoadHtml(htmlStr); HtmlNodeCollection node = html.DocumentNode.SelectNodes(@"//meta[@name='csrf-token']"); if (node != null) { Token = node.First().Attributes["content"].Value; } else { GetCookieAndToken(); } }
這里請(qǐng)求用的是HttpWebRequest因?yàn)镠ttpWebResponse拿到的Cookie能直接添加到CookieContainer,如果用HttpClient請(qǐng)求的話拿到的Cookie是IENumberable<string>類型的,(最終的POST請(qǐng)求)這里因?yàn)閿?shù)據(jù)過多需要用到異步所以考慮HttpClient,而HttpClient中設(shè)置Cookie的方式為CookieContainer。
所以這里是先用HttpWebRequest獲取CookieContainer然后再用得到CookieContainer添加到HttpHandler中用HttpClient發(fā)起Post請(qǐng)求,這么復(fù)雜是因?yàn)槲疫€沒找到將IENumberable<string>轉(zhuǎn)為CookieContainer。
第二步是獲取Token(代碼在上面獲取Cookie里面)
能破解CSRF是因?yàn)樵诰W(wǎng)頁找到隱藏的Token:
?
?獲取Token:
?
?這里解析Html用的是HtmlAgilityPack
上面文字很繞口所以下面有圖文解析:
?
?
?最后用Post請(qǐng)求數(shù)據(jù):
HttpClientHandler handler = new HttpClientHandler() { CookieContainer = CookieContainer}; using (HttpClient client = new HttpClient(handler)) { HttpRequestMessage message = new HttpRequestMessage(HttpMethod.Post, postUrl); message.Headers.Add("accept-language", "zh-CN,zh;q=0.9,en;q=0.8"); viewdata += Token; var dic = FormDataToDictionary(viewdata); message.Content = new FormUrlEncodedContent(dic); HttpResponseMessage response = client.SendAsync(message).Result; return response.Content.ReadAsStringAsync().Result; }
HttpContent格式:
MultipartFormDataContent ? => multipart/form-data
FormUrlEncodedContent => application/x-www-form-urlencoded
StringContent => application/json等
StreamContent ??=> ? binary
FormUrlEncodedContent()方法里的鍵值對(duì)只能一個(gè)一個(gè)加上去,這里我封裝了一個(gè)方法一次性加上去:
?? ? ?VS??
?
?
//viewparse轉(zhuǎn)成Dictionary static Dictionary<string, string> FormDataToDictionary(string viewParse_Encode) { var viewParse = System.Web.HttpUtility.UrlDecode(viewParse_Encode, System.Text.Encoding.UTF8); Dictionary<string, string> dic = new Dictionary<string, string>(); var arr = viewParse.Split(new char[] { '&' }, StringSplitOptions.RemoveEmptyEntries); foreach (var a in arr) { var arr2 = a.Split(new char[] { '=' }, StringSplitOptions.RemoveEmptyEntries); if (arr2.Length == 1) { string s = string.Empty; dic.Add(arr2[0], s); } else dic.Add(arr2[0], arr2[1]); } return dic; }
?
參考:
C#如何HttpWebRequest模擬登陸,獲取服務(wù)端返回Cookie以便登錄請(qǐng)求后使用 - 黃聰 - 博客園 (cnblogs.com)
C#中使用HttpClient來Post數(shù)據(jù)的內(nèi)容HttpContent的各種格式 - 齊建偉 - 博客園 (cnblogs.com)
?
本文摘自 :https://www.cnblogs.com/