Portuguese Legal Document PDF Metadata Extractor

Created By
geek2geeksa year ago
MCP server for extracting metadata from Portuguese legal documents using advanced PDF processing and database architecture
Overview

The Portuguese Legal Document PDF Metadata Extractor is a robust Python tool designed to extract structured metadata from Portuguese legal document PDFs, specifically those formatted according to the European Case Law Identifier (ECLI).

To use the extractor, clone the project repository, install the required dependencies, and place your PDF files in the designated directory. You can then utilize the PortugueseLegalPDFExtractor class to extract metadata from individual PDFs or batch process multiple documents.

  • High accuracy with a 100% confidence score and 96.84% exact match rate.
  • Production-ready with two extractor variants for different use cases.
  • Robust error handling and comprehensive validation.
  • Flexible confidence scoring options.
  • User-friendly interface with clear progress reporting.
  1. Extracting metadata from legal documents for research purposes.
  2. Automating the processing of large volumes of legal PDFs.
  3. Validating the accuracy of extracted data against ground truth.
  • What types of documents can be processed?

    The extractor is designed for Portuguese legal documents formatted in ECLI.

  • Is there a command line interface available?

    Yes, the production extractor includes a full CLI for easy usage.

  • What are the prerequisites for installation?

    You need Python 3.8+ and the pdfplumber package installed.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
geek2geeks
Star
0
Language
Python
License
-

Recommend Servers

View All
Tatsu55

15 hours ago
Mnemom

2 days ago
//beforeyouship — LLM Cost Modeling From Your Editor
@Indiegoing

Query realistic LLM cost models without leaving your editor. beforeyouship models the **true monthly cost** of an LLM app architecture — retries, prompt caching, batch discounts, infra overhead, and 3×/10× growth — across GPT-5.x, Claude, Gemini, DeepSeek, and more. Not a token calculator: a planning tool for the design phase, before you commit to a stack. **No API key needed to try it** — demo mode covers the six free-tier models. A Pro key from [beforeyouship.dev](https://beforeyouship.dev) unlocks the full 18-model catalog. ## What you can ask - "How much will a RAG chatbot cost at 10,000 requests/day?" - "Compare Claude Haiku vs Gemini Flash pricing for my workload" - "What's the cheapest model for a multi-step agent at scale?" - "Show me current per-token prices for Anthropic models" ## Tools ### `estimate_cost` Full cost model for an architecture at a given usage level. Returns Naive / Realistic / Worst Case monthly cost per model, 3×/10× growth scenarios, and an opinionated recommendation with reasoning. ### `get_model_prices` Current per-1M-token pricing — input, output, cached input, batch — with context windows and staleness metadata. ### `list_archetypes` Seven preset architecture patterns (simple chatbot, chatbot with history, RAG pipeline, multi-model router, coding assistant, document processor, multi-step agent) used as starting points for estimates. ## Setup **Claude Code:** ​```bash claude mcp add --transport http beforeyouship https://beforeyouship.dev/api/mcp ​``` **Cursor / other clients** — add a remote server: ​```json { "mcpServers": { "beforeyouship": { "type": "streamable-http", "url": "https://beforeyouship.dev/api/mcp" } } } ​``` Add an `Authorization: Bearer bys_...` header with a Pro key for the full catalog. ## Try it > Estimate the monthly cost of a RAG pipeline at 10,000 requests/day

2 days ago