Spring Ai In Action Pdf Github Site

Below is a structured, actionable "paper" – more accurately, a – on the topic "Spring AI in Action: Leveraging PDF Data via GitHub Repositories."

@Service public class PdfDocumentService public List<Document> parsePdfs(List<byte[]> pdfBytesList) return pdfBytesList.stream() .flatMap(bytes -> ByteArrayInputStream bais = new ByteArrayInputStream(bytes); TikaDocumentReader reader = new TikaDocumentReader(bais); return reader.get().stream(); // Returns List<Document> ) .collect(Collectors.toList()); spring ai in action pdf github

public void indexPdfsFromGitHub(String repo, String pdfPath) List<byte[]> pdfs = gitHubPdfFetcher.fetchPdfsFromRepo(repo, pdfPath); List<Document> rawDocs = pdfDocumentService.parsePdfs(pdfs); List<Document> chunkedDocs = splitter.apply(rawDocs); // Store in vector DB vectorStore.add(chunkedDocs); Below is a structured, actionable "paper" – more

@Service public class GitHubPdfFetcher private final GitHub github = new GitHubBuilder().withOAuthToken(System.getenv("GITHUB_TOKEN")).build(); public List<byte[]> fetchPdfsFromRepo(String repoName, String path) throws IOException GHRepository repo = github.getRepository(repoName); List<GHContent> pdfs = repo.getDirectoryContent(path).stream() .filter(c -> c.getName().endsWith(".pdf")) .toList(); return pdfs.stream().map(content -> try (InputStream is = content.read()) return is.readAllBytes(); catch (IOException e) throw new RuntimeException(e); ).collect(Collectors.toList()); Below is a structured

@Service public class IngestionPipeline private final TokenTextSplitter splitter = new TokenTextSplitter(500, 100); // 500 tokens per chunk private final VectorStore vectorStore; private final EmbeddingClient embeddingClient; @Autowired public IngestionPipeline(VectorStore vectorStore, EmbeddingClient embeddingClient) this.vectorStore = vectorStore; this.embeddingClient = embeddingClient;