Skip to content

Exporting Data

Microwler allows you to export scraped data to various formats. You can build custom export plugins or use on of its pre-defined exporters:

  • microwler.export.JSONExporter
  • microwler.export.CSVExporter
  • microwler.export.HTMLExporter

Use the export_to and exporters settings to configure the export behaviour.

microwler.export.BaseExporter

Use this class to build your custom export functionality, i.e. send data per HTTP or SMTP. The crawler instance will call export() once it's done with everything else. You can pass your plugin into the crawler by adding the class to settings['exporters']

__init__(self, domain, data, settings) special

Create a new BaseExporter

Parameters:

Name Type Description Default
domain str

the domain of this project/crawler

required
data list

list of processed Page objects

required
settings Settings

the current settings of this project/crawler

required
Source code in microwler/export.py
def __init__(self, domain: str, data: list, settings: Settings):
    """
    Create a new BaseExporter

    Arguments:
        domain: the domain of this project/crawler
        data: list of processed Page objects
        settings: the current settings of this project/crawler
    """
    self.domain = domain
    self.data = [page.__dict__ for page in data]
    self.settings = settings

export(self)

Export data to target destination

Source code in microwler/export.py
def export(self):
    """
    Export data to target destination
    """
    raise NotImplementedError()

microwler.export.FileExporter

This exporter will save data to your local filesystem. It currently provides exports to JSON, CSV or HTML tables. Take a look at the following exporters and their implementation to understand its usage.

convert(self)

Converts self.data to output format specified by FileExporter.extension.

Must return converted data as string

Source code in microwler/export.py
def convert(self):
    """
    Converts `self.data` to output format specified by `FileExporter.extension`.
    > Must return converted data as `string`
    """
    raise NotImplementedError()

export(self)

Writes data to file

Source code in microwler/export.py
def export(self):
    """ Writes data to file """
    data = self.convert()
    timestamp = datetime.now().strftime('%Y-%m-%d-%H:%M')
    path = os.path.join(self.settings.export_to, f'{self.domain}_{timestamp}.{self.extension}')
    try:
        os.makedirs(self.settings.export_to, exist_ok=True)
        with open(path, 'w') as file:
            file.write(data)
        LOG.info(f'Exported data as {self.extension.upper()} to: {path} [{self.domain}]')
    except Exception as e:
        LOG.error(f'Error during export: {e} [{self.domain}]')