|
|
#2
yiyanxiyin2025-11-20 11:44
问ds嘛:
程序代码:# 定义小说页面URL 以上为powershell代码, 执行通过$novelUrl = "http://www./105790648/" # 设置请求头,模拟浏览器访问 $headers = @{ "User-Agent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" "Accept" = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" "Accept-Language" = "zh-CN,zh;q=0.9,en;q=0.8" } try { Write-Host "正在下载网页内容..." -ForegroundColor Yellow # 使用Invoke-WebRequest下载网页内容 $response = Invoke-WebRequest -Uri $novelUrl -Headers $headers -UseBasicParsing # 获取HTML内容 $htmlContent = $response.Content Write-Host "网页下载成功!" -ForegroundColor Green # 创建HTML文档对象 $html = New-Object -ComObject "HTMLFile" $html.IHTMLDocument2_write($htmlContent) # 查找所有章节链接 $chapterLinks = $html.getElementsByTagName("a") | Where-Object { $_.href -like "*/105790648/*.html" -and $_.title -match "^第\d+章" } Write-Host "找到 $($chapterLinks.Count) 个章节链接" -ForegroundColor Cyan # 提取章节信息 $chapters = @() foreach ($link in $chapterLinks) { $chapters += [PSCustomObject]@{ ChapterNumber = if ($link.title -match "第(\d+)章") { [int]$matches[1] } else { 0 } Title = $link.title URL = $link.href } } # 去重并按照章节号排序 $uniqueChapters = $chapters | Sort-Object ChapterNumber -Unique # 输出结果 Write-Host "《吃喝玩乐之重生1997》章节列表(共 $($uniqueChapters.Count) 章)" -ForegroundColor Green Write-Host "=" * 80 # 显示所有章节 $uniqueChapters | Format-Table -Property ChapterNumber, Title, URL -AutoSize # 导出到CSV文件 $csvPath = "吃喝玩乐之重生1997_章节列表.csv" $uniqueChapters | Export-Csv -Path $csvPath -NoTypeInformation -Encoding UTF8 Write-Host "章节信息已导出到: $csvPath" -ForegroundColor Yellow # 显示统计信息 Write-Host "`n统计信息:" -ForegroundColor Cyan Write-Host "总章节数: $($uniqueChapters.Count)" -ForegroundColor White Write-Host "第一章: $(($uniqueChapters | Sort-Object ChapterNumber | Select-Object -First 1).Title)" -ForegroundColor White Write-Host "最新章: $(($uniqueChapters | Sort-Object ChapterNumber -Descending | Select-Object -First 1).Title)" -ForegroundColor White } catch { Write-Host "错误: $($_.Exception.Message)" -ForegroundColor Red Write-Host "可能的原因:" -ForegroundColor Red Write-Host "1. 网络连接问题" -ForegroundColor Red Write-Host "2. 网站访问限制" -ForegroundColor Red Write-Host "3. 网页结构发生变化" -ForegroundColor Red } |
网页地址为:http://www.
我想将各章节网址存入表文件URL.dbf中 字段有 章节 C(10),标题 C(50),网址 C(60) 最后根据URL.dbf中的 “网址”将小说各章节爬下来,请高手赐教,万分感谢!(前面发过类似帖子,无奈电脑打不开网页,这次小说能在电脑中打开,故再次求助)
只有本站会员才能查看附件,请 登录
程序代码: