Python Crawler

Python Tutorial

Packages

Requests

Python
1
import requests

BeautifulSoup

Iteration

小错误

Information Extraction

Find_all

定向爬取

中文字符

chr(12288)

tplt = “{0:^10}\t{1:{3}^10}\t{2:^10}”

Scrapy

  1. Start a scrapy project
    Shell
    1
    scrapy startproject python123demo
  • python123demo/: root directory
    • scrapy.cfg: configuration file of scrapy
    • python123demo/: customer code
      • __init__.py: Initialization script
      • items.py: Items code template (Inheritance)
      • middlewares.py: Middlewares code template (Inheritance)
      • piplines.py: Pipelines code template (Inheritance)
      • settings.py: Settings of project
      • spiders/: Index of code templates (Inheritance)
  1. Generate a scrapy spider
    Shell
    1
    2
    cd python123demo
    scrapy genspider demo python123.io

  2. Review demo.py
    python123demo/spider/demo.py
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    # -*- coding: utf-8 -*-
    import scrapy


    class DemoSpider(scrapy.Spider):
    name = 'demo'
    allowed_domains = ['python123.io']
    start_urls = ['http://python123.io/']

    def parse(self, response):
    pass
  3. Configure the spider
    python123demo/spider/demo.py
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    # -*- coding: utf-8 -*-
    import scrapy


    class DemoSpider(scrapy.Spider):
    name = 'demo'
    #allowed_domains = ['python123.io']
    start_urls = ['http://python123.io/ws/demo.html']

    def parse(self, response):
    fname = response.url.split('/')[-1]
    with open(fname, 'wb') as f:
    f.write(response.body)
    self.log('Saved file {0}.'.format(name))
    pass
  4. Run the spider
    Shell
    1
    2
    pwd
    scrapy crawl demo


python123demo/spider/demo.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# -*- coding: utf-8 -*-
import scrapy


class DemoSpider(scrapy.Spider):
name = 'demo'
#allowed_domains = ['python123.io']
start_urls = ['http://python123.io/ws/demo.html']

def parse(self, response):
fname = response.url.split('/')[-1]
with open(fname, 'wb') as f:
f.write(response.body)
self.log('Saved file {0}.'.format(name))
pass

# -*- coding: utf-8 -*-
import scrapy


class DemoSpider(scrapy.Spider):
name = 'demo'
#allowed_domains = ['python123.io']

def start_requests(self):
urls = [
'http://python123.io/ws/demo.html'
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):
fname = response.url.split('/')[-1]
with open(fname, 'wb') as f:
f.write(response.body)
self.log('Saved file {0}.'.format(name))
pass


Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory.
Yield are used in Python generators. A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function.

Yield vs. Return
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def gen(n):
for i in range(n):
yield i**2

for i in gen(5):
print(i)



def square(n):
ls = [i**2 for i in range(5)]
return ls

for i in square(5):
print(i)

Previous Close208.08
Open211.70
Bid225.60 x 800
Ask225.70 x 900
Day's Range202.32 - 224.46
52 Week Range60.97 - 224.46
Volume32,390,121
Avg. Volume14,473,820
Market Cap63.121B
Beta (5Y Monthly)N/A
PE Ratio (TTM)6,396.29
EPS (TTM)0.04
Earnings DateJun 02, 2020
Forward Dividend & YieldN/A (N/A)
Ex-Dividend DateN/A
1y Target Est132.07
Fair Value is the appropriate price for the shares of a company, based on its earnings and growth rate also interpreted as when P/E Ratio = Growth Rate. Estimated return represents the projected annual return you might expect after purchasing shares in the company and holding them over the default time horizon of 5 years, based on the EPS growth rate that we have projected.
Fair Value
XX.XX
Overvalued
-53% Est. Return
Research that delivers an independent perspective, consistent methodology and actionable insight
Related Research
View more

One More Thing

Python Algorithms - Words: 2,640

Python Crawler - Words: 1,663

Python Data Science - Words: 4,551

Python Django - Words: 2,409

Python File Handling - Words: 1,533

Python Flask - Words: 874

Python LeetCode - Words: 9

Python Machine Learning - Words: 5,532

Python MongoDB - Words: 32

Python MySQL - Words: 1,655

Python OS - Words: 707

Python plotly - Words: 6,649

Python Quantitative Trading - Words: 353

Python Tutorial - Words: 25,451

Python Unit Testing - Words: 68