c#爬虫-使用ChromeDriver 所见即所得

网友投稿 741 2022-11-19 07:30:43

c#爬虫-使用ChromeDriver 所见即所得

问题

最近在做爬虫的时候发现很多网页都是浏览器看得见,但是源文件是看不到的,也就是所谓的异步加载。这时候如果我们需要那些异步内容,要么是了解他的规则,进行条件的组合进而再次进行options = new ChromeOptions(); options.AddArguments("--test-type", "--ignore-certificate-errors"); options.AddArguments("user-agent=mozilla/5.0 (linux; u; android 2.3.3; en-us; sdk build/ gri34) applewebkit/533.1 (khtml, like gecko) version/4.0 mobile safari/533.1"); options.AddArgument("enable-automation"); // options.AddArgument("headless"); // options.AddArguments("--proxy-server= // IWebDriver driver = new ChromeDriver(System.Environment.CurrentDirectory, options); //chromeDriverService System.Environment.CurrentDirectory System.Environment.CurrentDirectory using (IWebDriver driver = new OpenQA.Selenium.Chrome.ChromeDriver(@"C:\Users\Administrator\Downloads\chromedriver_win32", options, TimeSpan.FromSeconds(120))) { // trylogin(driver); driver.Url = " //tenvideo_video_player_0 SetText(driver.PageSource); Thread.Sleep(200); try { for (int a = 1; a < 2; a++) { SetText("\r\n第" + a.ToString() + "个"); driver.Navigate().GoToUrl(" //登录 if (driver.Url.Contains("login.1688.com")) { SetText("\r\n需要登录,开始尝试..."); trylogin(driver); //尝试登录完成 //再试试 driver.Navigate().GoToUrl(" if (driver.Url.Contains("login.1688.com")) { //没办法退出 SetText("\r\n退出,换ip重试..."); return; } } //鼠标放上去的内容因为页面自带只能显示一个的原因 没办法做到全部显示 然后在下载 只能是其他方式下载 // var elements = document.getElementsByClassName('hover-container'); // Array.prototype.forEach.call(elements, function(element) { // element.style.display = "block"; // console.log(element); // }); IJavaScriptExecutor js = (IJavaScriptExecutor)driver; var sss = js.ExecuteScript(" var elements = document.getElementsByClassName('hover-container'); Array.prototype.forEach.call(elements, function(element) { console.log(element); element.setAttribute(\"class\", \"测试title\"); element.style.display = \"block\"; console.log(element); });"); Thread.Sleep(500); var responseModel = Write(driver.PageSource, Pagetypeenum.列表); Thread.Sleep(500); int i = 1; foreach (var offer in responseModel?.data?.offerList ?? new List()) { driver.Navigate().GoToUrl(offer.information.detailUrl); string responseDatadetail = driver.PageSource; Write(driver.PageSource, Pagetypeenum.详情); SetText("\r\n第" + a.ToString() + "-" + i.ToString() + "个"); Thread.Sleep(500); i++; } } } catch (Exception ex) { CloseChromeDriver(driver); throw; } } // Thread thread = new Thread(go); // thread.Start(); }

得到网页信息SetText(driver.PageSource);

private void button2_Click(object sender, EventArgs e) { //文件路径 string filePath = @"G:\conan\reptiles1688\bin\Debug\test.txt"; using (FileStream fsRead = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) { int fsLen = (int)fsRead.Length; byte[] heByte = new byte[fsLen]; fsRead.Read(heByte, 0, heByte.Length); string myStr = System.Text.Encoding.Default.GetString(heByte); this.textBox1.Text = myStr;///读取 } HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(this.textBox1.Text); HtmlNode node = doc.GetElementbyId("tenvideo_video_player_0"); textBox1.Text = node.Attributes["src"].Value; // var node = doc.DocumentNode.SelectNodes("//video[@id='tenvideo_video_player_0']//video"); // textBox1.Text = (node[3].InnerHtml); } }

解析得到我们想到的视频地址。

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:全网最全最细的jmeter接口测试教程以及接口测试流程(入门教程)
下一篇:springboot实现pdf里面插入图片
相关文章