Indexing PDFs and other binary files

Hello,

have you ever thought about adding content from pdf files and other bin files to the lucene search index?
I think using a library like [Apache Tika](http://tika.apache.org/) could make this [not too difficult](http://stackoverflow.com/questions/18098400/how-to-get-raw-text-from-pdf-file-using-java).

BTW. Is there any reason why the file names itself are not indexed?

Background: I am thinking about using git/gitblit as a document archive for PDFs and having a full text search index would be great.

Any thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Indexing PDFs and other binary files #1026

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Indexing PDFs and other binary files #1026

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions