Exporting Data
Microwler allows you to export scraped data to various formats. You can build custom export plugins or use on of its pre-defined exporters:
microwler.export.JSONExporter
microwler.export.CSVExporter
microwler.export.HTMLExporter
Use the
export_to
andexporters
settings to configure the export behaviour.
microwler.export.BaseExporter
Use this class to build your custom export functionality, i.e. send data per HTTP or SMTP.
The crawler instance will call export()
once it's done with everything else.
You can pass your plugin into the crawler by adding the class to settings['exporters']
__init__(self, domain, data, settings)
special
Create a new BaseExporter
Parameters:
Name | Type | Description | Default |
---|---|---|---|
domain |
str |
the domain of this project/crawler |
required |
data |
list |
list of processed Page objects |
required |
settings |
Settings |
the current settings of this project/crawler |
required |
Source code in microwler/export.py
def __init__(self, domain: str, data: list, settings: Settings):
"""
Create a new BaseExporter
Arguments:
domain: the domain of this project/crawler
data: list of processed Page objects
settings: the current settings of this project/crawler
"""
self.domain = domain
self.data = [page.__dict__ for page in data]
self.settings = settings
export(self)
Export data to target destination
Source code in microwler/export.py
def export(self):
"""
Export data to target destination
"""
raise NotImplementedError()
microwler.export.FileExporter
This exporter will save data to your local filesystem. It currently provides exports to JSON, CSV or HTML tables. Take a look at the following exporters and their implementation to understand its usage.
convert(self)
Converts self.data
to output format specified by FileExporter.extension
.
Must return converted data as
string
Source code in microwler/export.py
def convert(self):
"""
Converts `self.data` to output format specified by `FileExporter.extension`.
> Must return converted data as `string`
"""
raise NotImplementedError()
export(self)
Writes data to file
Source code in microwler/export.py
def export(self):
""" Writes data to file """
data = self.convert()
timestamp = datetime.now().strftime('%Y-%m-%d-%H:%M')
path = os.path.join(self.settings.export_to, f'{self.domain}_{timestamp}.{self.extension}')
try:
os.makedirs(self.settings.export_to, exist_ok=True)
with open(path, 'w') as file:
file.write(data)
LOG.info(f'Exported data as {self.extension.upper()} to: {path} [{self.domain}]')
except Exception as e:
LOG.error(f'Error during export: {e} [{self.domain}]')