這篇文章主要講解了“Java爬蟲之如何實(shí)現(xiàn)B站粉絲取關(guān)人排查”,文中的講解內(nèi)容簡單清晰,易于學(xué)習(xí)與理解,下面請(qǐng)大家跟著小編的思路慢慢深入,一起來研究和學(xué)習(xí)“Java爬蟲之如何實(shí)現(xiàn)B站粉絲取關(guān)人排查”吧!
公司主營業(yè)務(wù):成都網(wǎng)站建設(shè)、網(wǎng)站建設(shè)、移動(dòng)網(wǎng)站開發(fā)等業(yè)務(wù)。幫助企業(yè)客戶真正實(shí)現(xiàn)互聯(lián)網(wǎng)宣傳,提高企業(yè)的競爭能力。創(chuàng)新互聯(lián)公司是一支青春激揚(yáng)、勤奮敬業(yè)、活力青春激揚(yáng)、勤奮敬業(yè)、活力澎湃、和諧高效的團(tuán)隊(duì)。公司秉承以“開放、自由、嚴(yán)謹(jǐn)、自律”為核心的企業(yè)文化,感謝他們對(duì)我們的高要求,感謝他們從不同領(lǐng)域給我們帶來的挑戰(zhàn),讓我們激情的團(tuán)隊(duì)有機(jī)會(huì)用頭腦與智慧不斷的給客戶帶來驚喜。創(chuàng)新互聯(lián)公司推出成都免費(fèi)做網(wǎng)站回饋大家。
開發(fā)工具:Eclipse/IDEA
瀏覽器:Google Chrome
瀏覽器Selement驅(qū)動(dòng):Selenium 3.5
Jar包:
// Selenium驅(qū)動(dòng)版本需要和Chrome瀏覽器版本對(duì)應(yīng)
獲取Cookie(終端輸入或者使用Selenium打開掃碼登錄)
請(qǐng)求https://api.bilibili.com/x/relation/followers接口
解析數(shù)據(jù)
存入csv
package com.mm.rep;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Scanner;
import java.util.Set;
import org.openqa.selenium.Cookie;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.URI;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.params.HttpMethodParams;
import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.csvreader.CsvWriter;
import org.apache.commons.lang3.StringUtils;
public class Main {
private static final Logger logger = LogManager.getLogger(Main.class);
private static WebDriver driver = null;
private static GetMethod getMethod = null;
private static Set<Cookie> bcookies = null;
private final static String BLOGINURL = "https://passport.bilibili.com/login";
private final static String BMAINPAGE = "https://www.bilibili.com/";
Main(){
BasicConfigurator.configure();
// 初始化GetMethod,設(shè)置不變的RequestHeader
getMethod = new GetMethod();
getMethod.getParams().setParameter(HttpMethodParams.HTTP_CONTENT_CHARSET, "UTF-8");
getMethod.addRequestHeader(":authority", "api.bilibili.com");
getMethod.addRequestHeader(":method", "api.bilibili.com");
getMethod.addRequestHeader(":scheme", "GET");
getMethod.addRequestHeader(":scheme", "https");
getMethod.addRequestHeader("accept", "*/*");
getMethod.addRequestHeader(":scheme", "https");
getMethod.addRequestHeader("accept-language", "zh-CN,zh;q=0.9");
getMethod.addRequestHeader("sec-fetch-dest", "script");
getMethod.addRequestHeader("sec-fetch-mode", "no-cors");
getMethod.addRequestHeader("sec-fetch-site", "same-site");
getMethod.addRequestHeader("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36");
}
public static String getCookie() throws InterruptedException {
String scCookie = null;
Scanner ip = new Scanner(System.in);
logger.info("請(qǐng)輸入Cookie,如果沒有請(qǐng)按回車:");
scCookie = ip.nextLine();
if(scCookie.length() != 0) {
return scCookie;
}
logger.info("開始掃碼登錄");
// 設(shè)置驅(qū)動(dòng)地址
System.setProperty("webdriver.chrome.driver", "H:/chromedriver/chromedriver.exe");
// 啟動(dòng)設(shè)置
ChromeOptions options = new ChromeOptions();
// 創(chuàng)建ChromeDriver
driver = new ChromeDriver(options);
// 打開Bilibili登錄頁面
driver.get(BLOGINURL);
// 等待掃碼登錄
while(true) {
if(driver.getCurrentUrl().equals(BMAINPAGE)) {
break;
}else {
Thread.sleep(100);
}
}
logger.info("掃碼登錄成功");
//獲取cookie
bcookies = driver.manage().getCookies();
String cookie = StringUtils.join(bcookies, "; ");
return cookie;
}
public static List<JSONObject> getFanS(String cookie,String vmid,int pn,int ps) throws InterruptedException, HttpException, IOException {
HttpClient client = new HttpClient();
// 拼接url
StringBuffer sBuffer = new StringBuffer();
sBuffer.append("https://api.bilibili.com/x/relation/followers?vmid=");
sBuffer.append(vmid);
sBuffer.append("&pn=");
sBuffer.append(pn);
sBuffer.append("&ps=");
sBuffer.append(ps);
sBuffer.append("&order=desc&jsonp=jsonp");
getMethod.setURI(new URI(sBuffer.toString(), true));
getMethod.getParams().setParameter(HttpMethodParams.HTTP_CONTENT_CHARSET, "UTF-8");
// 設(shè)置請(qǐng)求頭
getMethod.addRequestHeader("cookie", cookie);
// 發(fā)送請(qǐng)求
client.executeMethod(getMethod);
// 獲取數(shù)據(jù)
String info = new String(getMethod.getResponseBody(), "UTF-8");
JSONObject fans = JSONObject.parseObject(info).getJSONObject("data");
JSONArray fArray = JSONArray.parseArray(fans.getString("list"));
return JSON.parseArray(fArray.toJSONString(), JSONObject.class);
}
public static void main(String[] args) throws InterruptedException, HttpException{
logger.info("程序開始...");
new Main();
// 獲取Cookie
String cookie = Main.getCookie();
CsvWriter csvWriter = new CsvWriter("C:\\Users\\computer\\Desktop\\aaa.csv", ',', Charset.forName("UTF-8"));
String[] csvHeaders = { "mid", "粉絲名字","粉絲簽名","粉絲頭像"};
try {
csvWriter.writeRecord(csvHeaders);
int pn = 1;
boolean end = false;
while(true) {
for (JSONObject f : Main.getFanS(cookie, "309103931", pn, 20)) {
if(f == null) {
end = true;
break;
}
String[] csvContent1 = {f.getString("mid"), f.getString("uname"),f.getString("sign"),f.getString("face")};
System.out.println(csvContent1);
csvWriter.writeRecord(csvContent1);
}
pn++;
Thread.sleep(100);
if(end == true) {
break;
}
}
} catch (IOException e) {
System.out.println(e);
e.printStackTrace();
}
csvWriter.close();
driver.close();
logger.info("程序結(jié)束");
}
}容易被攔截,最多獲取不到1000個(gè)

感謝各位的閱讀,以上就是“Java爬蟲之如何實(shí)現(xiàn)B站粉絲取關(guān)人排查”的內(nèi)容了,經(jīng)過本文的學(xué)習(xí)后,相信大家對(duì)Java爬蟲之如何實(shí)現(xiàn)B站粉絲取關(guān)人排查這一問題有了更深刻的體會(huì),具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是創(chuàng)新互聯(lián),小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章,歡迎關(guān)注!
分享文章:Java爬蟲之如何實(shí)現(xiàn)B站粉絲取關(guān)人排查
本文網(wǎng)址:http://www.chinadenli.net/article46/iiiceg.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供企業(yè)網(wǎng)站制作、關(guān)鍵詞優(yōu)化、網(wǎng)站維護(hù)、小程序開發(fā)、App開發(fā)、品牌網(wǎng)站制作
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場,如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)