on 04-14-2012 9:16 PM
Is HANA Capable of Handling Unstructured Data?
Hi Neelesh,
present HANA releases have no exposed capability for handling unstructured data. We are presently integrating text search/analysis capabilities, including backend capabilities and a UI component, expected with the next support package.
Find a short demo video in this community at https://www.experiencesaphana.com/videos/1046
Kind regards,
Richard
--
Dr. Richard Bremer
Customer Solution Adoption (CSA), SAP AG
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Richard,
Is this the UI tool kit functionality now available with SPS 4?
http://help.sap.com/hana/ui_toolkit/index.html
Is it possible to upload PDF files to HANA and search through the content of the files using this option?
Thanks.
Deepu
Hi, I was able to upload PDF files to a BLOB column in a column table in HANA db with the help from Juergen Schmerder. What you need is to create a simple script using any programming language that can establish a connection thru ODBC or JDBC, like .NET, Java, etc...Here's a sample of the script that I used to upload the files, as you can see, is quite simple...
con = dbapi.connect(‘hanahost', 30015, 'SYSTEM', '********') #Open connection to SAP HANA
cur = con.cursor() #Open a cursor
file = open('doc.pdf', 'rb') #Open file in read-only and binary
content = file.read() #Save the content of the file in a variable
cur.execute("INSERT INTO BLOBTEST VALUES(?,?)", (2,content)) #Save the content to a table
file.close() #Close the file
cur.close() #Close the cursor
con.close() #Close the connection
Now, to be able to search within the content of the files you will need to use Fuzzy Search. Here's an example of a query that looks for the word "march" in the content of the files. The score that you will get back is a TF/IDF score (Term Frequency/Inverse Document Frequency), which means that the score will be calculated based on the number of times the word "march" is found in the content of the file, the file with the most number of matches will have the highest score.
SELECT TO_DECIMAL(SCORE(),3,2) AS score, *
FROM BLOBTEST
WHERE CONTAINS("File_Content", 'march',
FUZZY(0.5, 'textSearch=fulltext'))
ORDER BY "Year", "Month";
Thanks, Lucas.
Practice your SAP HANA™ development skills:
Info en Español sobre SAP HANA™:
Hi Lucas,
Now I am trying to build an text analysis app.
One function is that when user select the PDF file and click the upload button in the webpage, the program will upload the content of the PDF into a column of the HANA table.
I have read your blog before. Now I am trying to follow the steps described in your blog and comments.
http://scn.sap.com/community/developer-center/hana/blog/2013/01/03/sap-hana-text-analysis
I just wonder that how to implement this method for a fresh developer without any knowledge of script.
Is there any available mechanism embeded in HANA studio now to do this?
Thanks!
User | Count |
---|---|
93 | |
10 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.