Web Service
Microwler ships with a JSON API built with Quart. It provides a simple way to run your crawlers and retrieve their scraped data via HTTP.
Usage
To start the webservice, activate your workspace and run the following command:
serve
Per default, this will start a production-ready ASGI application on
localhost:5000
usingQuart
withhypercorn
.
You can customize the hostname and port:
serve [-p|--port PORT]
API
crawl(project_name)
async
Run the project's crawler and return the results
- Route:
/crawl/<str:project_name>
- Method:
GET
- Response example:
{
data: [
{
url: "https://quotes.toscrape.com/"
status_code: 200,
depth: 0,
discovered: "2021-02-05",
links: [
"https://quotes.toscrape.com/tag/inspirational/",
"https://quotes.toscrape.com/author/Jane-Austen",
"https://quotes.toscrape.com/tag/obvious/page/1/",
"https://quotes.toscrape.com/tag/friends/",
"https://quotes.toscrape.com/tag/misattributed-eleanor-roosevelt/page/1/",
...
],
data: {
title: "Quotes to Scrape"
headings: {
h1: ["Quotes to Scrape"],
h2: ["Top Ten tags"],
h3: [""]
},
},
},
...
]
}
data(project_name)
async
Return the project's cached data
- Route:
/data/<str:project_name>
- Method:
GET
- Response is in the same format as above
project(project_name)
async
Return the project status
- Route:
/status/<str:project_name>
- Method:
GET
- Response example:
{
name: "quotes",
start_url: "https://quotes.toscrape.com/"
last_run: {
state: "finished successfully",
timestamp: "2021-02-05 17:47"
},
}
status()
async
Return the service status
- Route:
/status
- Method:
GET
- Response example:
{
app: {
up_since: "2021-02-05 17:42:13",
version: "0.1.7"
},
projects: [
"quotes"
]
}
Microwler UI
Once the webservice is started, it will serve a NuxtJS application at localhost:<PORT>/
The application can be used as a convenient way to run crawlers and retrieve/monitor their data. It consumes the API described above.