Processor

class burdoc.processors.processor.Processor(name: str, log_level: int = 20, max_threads: int | None = None)

Abstract base class for a general Processor. Processors receive data in a single blob, extract any needed data, then write new or updated fields back to the data store.

abstract add_generated_items_to_fig(page_number: int, fig: Figure, data: Dict[str, Any])

Draw any items generated by this processor to a page image

check_requirements(data: Any) bool

Checks that required data fields are present in the data.

Parameters:

data (Any) – Primary data store

Returns:

Are all fields present

Return type:

bool

abstract generates() List[str]

Return list of fields added by this processor

get_data(data: Any) List[Dict[int, Any]]

Returns all of the data in a list of required fields. Optional requirements are returned as ‘None’ if not present

Parameters:

data (Any) – Primary data store

Returns:

List of fields

Return type:

List[Dict[int, Any]]

get_page_data(data: Dict[str, Dict[int, Any]], page_number: int | None = None) Iterator[List[Any]]

Returns an iterable of the passed data segmented by page number. Optional requirements are returned as ‘None’ if not present

Parameters:
  • data (Dict[str, Dict[int, Any]]) – Primary data store

  • page_number (Optional[int], optional) – Return a specific page’s data. Defaults to None.

Yields:

Iterator[List[Any]] – An iterator over the page-grouped fields

initialise()

Perform any expensive operations required to create a processor

process(data: Any) Any

Transforms the processed data

abstract requirements() Tuple[List[str], List[str]]

Return list of required data fields and list of optional data fields