From a database migration, we have a data-dump in a postgreSQL database.
The task is to write a script in java or groovy in order to read out the files in the correct format and save them to the server. For some files, the mime_type is specified (e.g. application/pdf, image/png) in a column 'mime_type'. In these cases, i was able to save them in the correct format.
(-> read them in as byte array input stream, save them as a file in the corresponding format).
But 90% of the files have mime_type "application/octet-stream". In these cases, i suspect the type or format of the file is just unkown. In order to be able to save them in the correct format (e.g. pdf of png), i'd somehow need to be able to find out what type the files could be. And then convert them accordingly.
I already tried the URLConnection.guessContentTypeFromStream(inputStream) method, but this did not recognise the mime_type.
GroovyRowResult[] fileInBinary = sql.rows("""
SELECT * FROM table_name WHERE id = 123456
""")
def temp = []
fileInBinary.each { GroovyRowResult row ->
temp << row.data
}
InputStream byteArrayInputStream = new ByteArrayInputStream((byte[]) temp.flatten(), temp.flatten().size())
def mimeType = java.net.URLConnection.guessContentTypeFromStream(byteArrayInputStream)
println "guessed mime Type is: " + mimeType
..followed by code to save the file, which works when mime_type specifies file format, e.g. pdf, word, or png
This code works, except for the guessContentType-Part.
Files are saved in binary data format (BYTEA, postgres equivalent to BLOB).
Does anyone know how to find out which format these application/octet-stream-files originally had, extract them from the binary data-columns (BYTEA-format) in the database and save/read the files to disk, as e.g. pdf, docx, or png?
Many thanks in advance!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…