Skip to content

Web Service

Microwler ships with a JSON API built with Quart. It provides a simple way to run your crawlers and retrieve their scraped data via HTTP.

Usage

To start the webservice, activate your workspace and run the following command:

serve

Per default, this will start a production-ready ASGI application on localhost:5000 using Quart with hypercorn.

You can customize the hostname and port:

serve [-p|--port PORT]

API

crawl(project_name) async

Run the project's crawler and return the results

  • Route: /crawl/<str:project_name>
  • Method: GET
  • Response example:
{
    data: [
        {
            url: "https://quotes.toscrape.com/"
            status_code: 200,
            depth: 0,
            discovered: "2021-02-05",
            links: [
                "https://quotes.toscrape.com/tag/inspirational/",
                "https://quotes.toscrape.com/author/Jane-Austen",
                "https://quotes.toscrape.com/tag/obvious/page/1/",
                "https://quotes.toscrape.com/tag/friends/",
                "https://quotes.toscrape.com/tag/misattributed-eleanor-roosevelt/page/1/",
                ...
            ],
            data: {
                title: "Quotes to Scrape"
                headings: {
                    h1: ["Quotes to Scrape"],
                    h2: ["Top Ten tags"],
                    h3: [""]
                },
            },
        },
        ...
    ]
}

data(project_name) async

Return the project's cached data

  • Route: /data/<str:project_name>
  • Method: GET
  • Response is in the same format as above

project(project_name) async

Return the project status

  • Route: /status/<str:project_name>
  • Method: GET
  • Response example:
{
    name: "quotes",
    start_url: "https://quotes.toscrape.com/"
    last_run: {
        state: "finished successfully",
        timestamp: "2021-02-05 17:47"
    },
}

status() async

Return the service status

  • Route: /status
  • Method: GET
  • Response example:
{
    app: {
        up_since: "2021-02-05 17:42:13",
        version: "0.1.7"
    },
    projects: [
        "quotes"
    ]
}

Microwler UI

Once the webservice is started, it will serve a NuxtJS application at localhost:<PORT>/

The application can be used as a convenient way to run crawlers and retrieve/monitor their data. It consumes the API described above.

Microwler UI