Scraping, ETL, notebooks. Production-grade where we'd ship it, dev-only where we wouldn't.
Managed scraping infra + open-source SDK.
— Hosted or self-run. Their SDK alone is worth a look.
jQuery-style server-side HTML parsing.
— We use it for every non-SPA crawl in the KB pipeline.
Spec-compliant DOM for worker/Node environments.
— When you need a real DOM and cheerio's jQuery style isn't enough.
Headless browser for when fetch+parse isn't enough.
— Ships its own browsers; no driver drama. Our default for SPA scraping.
Playwright's older sibling. Still fine, still maintained.
— Reach for this when the team already knows it; Playwright when starting fresh.
Python-native crawling framework with built-in queue + middlewares.
— The right tool at 10K+ pages. Scales without becoming your project.
Type-first data pipelines with asset lineage.
— More opinionated than Prefect. Pick it when asset thinking fits your data.
Local analytical SQL on files.
— We use it for CSV wrangling in consulting engagements. One binary, no server.
Python workflow orchestration with UI.
— What VORLUX's orchestrator would look like if we used off-the-shelf scheduling.
Asistente IA — respuestas basadas en nuestra KB
Reciba respuestas personalizadas:
¿Qué le trae hoy?