C#中抓取网页的方法
获取网页的HTML,大致有三种方法:1. 通过WebClient下载网页;2. 通过HttpWebRequest和HttpWebResponse获得网页的HTML;3. 通过微软提供的WebBrowser控件获得网页的Document Tree。解析HTML,主要方法有两种:正则表达式和Document Tree。以下分别给予简要介绍。 1: string url = "http://www.ups.com/WebTracking/track"; 2: string postData = "loc=zh_cn&HTMLVersion=5.0&saveNumbers=null&trackNums 3: =1ZX580116610381498&AgreeToTermsAndCondition 4: string html = ""; 5: Encoding encode = Encoding.GetEncoding("GB2312"); 6: byte[] data = encode.GetBytes(postData); 7: HttpWebRequest req = WebRequest.Create(url) as HttpWebRequest; 8: req.AllowAutoRedirect = true; 9: req.Method = "POST"; 10: req.ContentType = "application/x-www-form-urlencoded"; 11: req.ContentLength = data.Length; // 要Post的数据的长度 12: 13: // 把要Post的Data写入(追加)到outStream对象中,使其具有post data
14: Stream outStream = req.GetRequestStream();
15: outStream.Write(data, 0, data.Length);
16: outStream.Close();
17: 18: // Send Request and get the response 19: HttpWebResponse response = req.GetResponse() as HttpWebResponse; 20: 21: // 得到response的流
22: Stream responseStream = response.GetResponseStream();
23: StreamReader sr = new StreamReader(responseStream, encode); |
凌众科技专业提供服务器租用、服务器托管、企业邮局、虚拟主机等服务,公司网站:http://www.lingzhong.cn 为了给广大客户了解更多的技术信息,本技术文章收集来源于网络,凌众科技尊重文章作者的版权,如果有涉及你的版权有必要删除你的文章,请和我们联系。以上信息与文章正文是不可分割的一部分,如果您要转载本文章,请保留以上信息,谢谢! |