Welcome to week 1 of the May Developer Challenge on AI at SAP! The topic of this month’s challenge are the SAP AI Services; Document Information Extraction and Data Attribute Recommendation. To participate in the challenge you just have to post a screenshot of your solution as a reply in this discussion of the corresponding week.

SAP AI Services help you implement custom use cases by providing powerful algorithms specifically tailored to business problems.

Document Information Extraction:

The Document Information Extraction service is available in two editions, the original Base Edition and the new genAI-based Premium Edition. The genAI-based Premium edition is using a large language model via generative AI hub on SAP AI Core to extract information from all kinds of documents.
With Document Information Extraction you can extract information from the file types PDF or single page JPEG, PNG and TIFF.
Supported document types are: invoice, paymentAdvice, purchaseOrder, businessCard, deliveryNote, resume and birthCertificate. You can also create your own schema to process other document types.
You can also extract OCR results directly to process the raw text from you document files as well as use the classification capabilities to classify your documents into the three classes: invoice, purchase order and payment advice.
You can also enrich your extracted data with your metadata.
You can access the Document Information Extraction service via the UI, via swagger/client calls and the Python SDK.

Data Attribute Recommendation:

With Data Attribute Recommendation you can train your own model to classify data records, you can also tackle more complex classification problems such as hierarchical classification of products and predict missing data records
Data Attribute Recommendation can be used via swagger/client calls as well as the AI API Python SDK and SAP AI Launchpad
If you want to access Data Attribute Recommendations via Postman you can download this Postman Collection

Weekly Challenges

Week 1 Challenge – DOX UI

This week you will use the UI of the Document Information Extraction service to extract information from your favorite recipe. The UI is great to try out your use case and get a feeling of the capabilities of the service. For productive use cases you would call the APIs or implement a workflow using the Python SDK. Productively, you could then for example implement a workflow that processes documents right out of your mailbox, saves the extracted information in the system and structure you need as well as triggers other necessary workflows.

For this week’s challenge, use the UI to extract the header fields “recipe name”, “portions” and the line items “quantity” and “ingredient” from your chosen recipe. Therefore, you need to create a custom schema. Make sure the recipe is in one of the supported languages.

When creating a custom Schema chose the Setup Type auto to use the llm/genAI-based Premium Edition. In the description field provide information for the large language model to understand what you are referring to e.g. “the name of the recipe”.

Get a free trial account and run DOX booster: https://developers.sap.com/tutorials/cp-aibus-dox-booster-key.html
Get the Document Information Extraction UI: https://developers.sap.com/tutorials/cp-aibus-dox-ui-sub.html
Create a custom schema: https://developers.sap.com/tutorials/cp-aibus-dox-ui-gen-ai.html
OPTIONAL: Create a template and add your document to the template (improves performance for future recipes)
Upload your favorite recipe to extract the name, portions, quantity and ingredients. Make sure your recipe pdf is only 1 or 2 pages long, otherwise you will quickly reach the limit (50 pages) of the trial plan. And try not to use the entire 50 page quota because we will need it next week as well!
Submission: share a screenshot of the extraction results and the document and write a comment to share your experience using the UI in the discussion below.

Example Screenshot:

Additional information:

Processing a ©Pokémon Card in 90 seconds with Document Information Extraction powered by generative AI: https://community.sap.com/t5/technology-blogs-by-sap/processing-a-pok%C3%A9mon-card-in-90-seconds-with-document-information/ba-p/13571759

Be aware of limits that apply in free tier and trial accounts: https://help.sap.com/docs/document-information-extraction/document-information-extraction/free-tier-option-and-trial-account-technical-constraints

How to improve your results: https://help.sap.com/docs/document-information-extraction/document-information-extraction/best-practices-298a9a0936d5436494c644ec51bbdcea

If you do not want to run the booster for Document Information Extraction make sure to register to the service using the blocks_of_100 service plan and assign the necessary role collections to your user.

In this “2-min of” video I am describing the technical aspects of the BASE service (without use of LLM) behind the scenes.