@plust/datasleuth - v0.2.0
    Preparing search index...

    Interface ExtractContentOptions

    Options for the content extraction step

    interface ExtractContentOptions {
        selectors?: string;
        selector?: string;
        maxUrls?: number;
        maxContentLength?: number;
        includeInResults?: boolean;
        timeout?: number;
        retry?: { maxRetries?: number; baseDelay?: number };
        minContentLength?: number;
        continueOnError?: boolean;
        requireSuccessful?: boolean;
        [key: string]: any;
    }

    Hierarchy

    • StepOptions
      • ExtractContentOptions

    Indexable

    • [key: string]: any
    Index

    Properties

    selectors?: string

    CSS selectors to extract content from

    selector?: string

    Alias for selectors (for backwards compatibility)

    maxUrls?: number

    Maximum number of URLs to process

    maxContentLength?: number

    Maximum content length per URL (characters)

    includeInResults?: boolean

    Whether to include the extracted content in the final results

    timeout?: number

    Timeout for each URL fetch in milliseconds

    retry?: { maxRetries?: number; baseDelay?: number }

    Fetch retry configuration

    Type declaration

    • OptionalmaxRetries?: number

      Maximum number of retries

    • OptionalbaseDelay?: number

      Base delay between retries in ms

    minContentLength?: number

    Minimum content length to consider a successful extraction

    continueOnError?: boolean

    Whether to continue if some URLs fail to extract

    requireSuccessful?: boolean

    Whether to require at least one successful extraction