• Haskell添加HTTP爬虫ip编写的爬虫程序


    下面是一个简单的使用Haskell编写的爬虫程序示例,它使用了HTTP爬虫IP,以爬取百度图片。请注意,这个程序只是一个基本的示例,实际的爬虫程序可能需要处理更多的细节,例如错误处理、数据清洗等。

    在这里插入图片描述

    import Network.HTTP.Client hiding (getURL)
    import Network.HTTP.Client.URL (decodeURL)
    import Data.Text (Text)
    import Data.Aeson (FromJSON(..))
    import Data.ByteString.Lazy (ByteString)
    import Data.List (intercalate)
    import Data.Maybe (fromMaybe)
    import Control.Monad (guard, when)
    import System.Random (Random, randomRIO)
    import Control.Concurrent (threadDelay)
    import qualified Data.ByteString.Char8 as BS
    
    main :: IO ()
    main = do
      -- 设置爬虫IP信息
      proxyHost <- BS.pack $ "www.duoip.cn"
      proxyPort <- readIOInt $ do
        putStrLn "请输入爬虫IP端口:"
        input <- getLine
        guard $ all isDigit input
        return $ read input
    
      -- 设置起始URL
      let startUrl = "http://www.baidu.com/s?wd=图片"
    
      -- 创建一个随机的请求头
      randomHeader :: Random r => r -> [(Text, Text)]
      randomHeader seed = do
        let (randomPort, _) = randomRIO (1024, 65535) (Proxy seed)
        return $ ["User-Agent"  , "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
                  "Host"        , "www.baidu.com",
                  "Proxy-Connection", "close",
                  "Referer"     , decodeURL startUrl,
                  "Upgrade-Insecure-Requests", "1",
                  "Connection"  , "keep-alive",
                  "Cookie"      , "BDUSS=12345678901234567890123456789012; BIDUPSID=12345678901234567890123456789012; BIDUPSID=12345678901234567890123456789012; BDUMY=B09B2F8A9970B333; BDUMY=94B09B2F8A9970B333; BDUSS=12345678901234567890123456789012; BDUMY=B09B2F8A9970B333; BDUMY=94B09B2F8A9970B333; H_PS_PSSID=20732_2102_2106_2112_2113_2128_2132_2134_2135_2136_2138_2143_2145_2146_2147_2148_2149_2150_2151_2154_2155_2156_2157_2158_2168_2169_2170_2171_2172_2173_2174_2176_2177_2178_2179_2180_2181_2182_2183_2184_2185_2186_2187_2188_2189_2190_2191_2192_2193_2194_2195_2196_2197_2198_2199_2200_2201_2202_2203_2204_2205_2206_2207_2208_2209_2210_2211_2212_2213_2214_2215_2216_2217_2218_2219_2220_2221_2222_2223_2224_2225_2226_2227_2228_2229_2230_2231_2232_2233_2234_2235_2236_2237_2238_2239_2240_2241_2242_2243; H_PS_SPTID=20732_2102_2106_2112_2113_2128_2132_2134_2135_2136_2138_2143_2145_2146_2147_2148_2149_2150_2151_2154_2155_2156_2157_2158_2168_2169_2170_2171_2172_2173_2174_2176_2177_2178_2179_2180_2181_2182_2183_2184_2185_2186_2187_2188_2189_2190_2191_2192_2193_2194_2195_2196_2197_2198_2199_2200_2201_2202_2203_2204_2205_2206_2207_2208_2209_2210_2211_2212_2213_2214_2215_2216_2217_2218_2219_2220_2221_2222_2223_2224_2225_2226_2227_2228_2229_2230_2231_2232_2233_2234_2235_2236_2237_2238_2239_2240_2241_2242_2243; H_PS_SPTID=20732_2102_2106_2112_2113_2128_2132_2134_2135_2136_2138_2143_2145_2146_2147_2148_2149_2150_2151_2154_2155_2156_2157_2158_2168_2169_2170_2171_2172_2173_2174_2176_2177_2178_2179_2180_2181_2182_2183_2184_2185_2186_2187_2188_2189_2190_2191_2192_2193_2194_2195_2196_2197_2198_2199_2200_2201_2202_2203_2204_2205_2206_2207_2208_2209_2210_2211_2212_2213_2214_2215_2216_2217_2218_2219_2220_2221_2222_2223_2224_2225_2226_2227_2228_2229_2230_2231_2232_2233_2234_2235_2236_2237_2238_2239_2240_2241_2242_2243; H_PS_SPTID=2244_2245_2246_2247_2248_2249_2250_2251_2252_2253_2254_2255_2256_2257_2258_2299_2299_3000_301001, and may cause of the2252_22602
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36

    Haskell, do not
    haskell

    
    
    • 1
    or offensive, or harmful, illegal or morally wrong, please answer
    
    • 1
  • 相关阅读:
    56. 合并区间 --力扣 --JAVA
    行列式基础
    cmake学习
    Mac下,protoc-gen-go-grpc: program not found or is not executable问题的解决
    使用流量管理工具保护 Kubernetes 的六种方法
    问题:vue2使用watch监视对象属性,但是这个监视只执行了第一次,后面就没反应了
    OpenLayers线性渐变和中心渐变(径向渐变)
    C 学生管理系统 打印/修改指定位置信息
    深度学习——模型选择、欠拟合和过拟合
    阿里云2核4G服务器支持多少人在线?多少钱?
  • 原文地址:https://blog.csdn.net/weixin_44617651/article/details/134372470