This is a step-by-step walkthrough with the goal of providing enough detail that you can build this yourself, and maybe even improve upon it.
Why build this?
I’ve always had a love and fascination for Wikipedia. Many of the articles include corresponding location data. I thought it would be cool to be able to walk around and hold up your phone to something and have someone there to tell you more about it – even cooler when AR glasses are available and I don’t have to hold up my phone. This might encourage people to learn more about the world around them, and keep them more in the moment by not having to look down at their phones to get that information.
It would also be nice if I could look at a 3D model from the comfort of my home and get related info about it from Wikipedia.
Essentially, this is envisioning what Wikipedia could look like in augmented reality.
Also, I’m going to use an article about some artwork, because I’m interested in the experience of placing art on my wall and getting information about it. Artsy and Saatchi Art have apps that allow me to place art on my wall, but I can’t get information about the art while I’m looking at it in AR. Maybe this could even help artists sell more art, which would nice.
Tools and Tech
Here is what I plan to use to accomplish this:
- Dialogflow – used to build out the conversation.
- Wikipedia – for the content of the conversation.
- Blender – to build the 3D art model.
- Hootsy – to create the voice interactive assistant.
- Unity – to deploy the experience to an iOS and Android app for the AR experience.
Creating the Conversation
Login to Dialogflow (https://dialogflow.com/) and create a new agent. It should look like this to start.
If you’re new to building conversations (aka chatbots) it may help to read through some of the Dialogflow documentation or YouTube videos, but I’ll try to walkthrough it here. Select the Default Welcome Intent. This is the response to someone saying hello or a similar variation of it. Change the Response of this intent. Delete what is there and add something new. Here is mine:
Save and return to the list of intents. If you hover over the welcome intent, you’ll see an option to add follow-up intent. Select that and then ‘yes’. Open up that intent and scroll to the bottom again to change the response. This time, I’m going to create a response that displays buttons. Select the ‘+’ next to the default tab and then select Facebook Messenger. Hootsy is setup to handle a similar message structure to Facebook, so this will work fine. Select Add response and then Quick Replies. Define the text field and buttons. The benefit of displaying buttons is that it helps guide the conversation so that user better knows what they can ask. In this case, I want the assistant to indicate what they know and to create buttons that directly map to sections of the Wikipedia article. Here is the wikipedia article I’ll be using https://en.wikipedia.org/wiki/Drowning_Girl and here is the Dialogflow response.
You may also want to create a follow-up intent for the user saying ‘no’ when asked if they want to know more about the art.
Next, we’re going to create intents that correspond to each of these quick replies. I’ll show the process for one.
Create a new intent and call it Summary. Define the training phrases which are what the user could say that would trigger the assistant to respond with the wikipedia summary. Dialogflow handles the Natural Language Processing (NLP) so it’ll determine if something the user says closely responds to one of these training phrases. Dialogflow provides analytics to help you understand how people are interacting and what training phrases you should add or update to improve the experience. For now, we’ll start with the following:
I generally try to keep the responses very short, but since this is intended to be more educational, I’ll experiment with providing the entire first paragraph of the summary followed by a question, ‘Do you want to know more?’.
If they say yes, then the assistant responds with the second paragraph of text. I can do this by adding a follow-up intent similar to what we did with the welcome intent.
Continue this process for all quick replies.
To make the assistant feel smarter, I’m going to do a few more things. First, I’m going to turn on Dialogflow’s ‘Small Talk’. Next, I’m going to create a few more intents to handle common questions about the artwork like size, materials, location, etc.
Next, I’m going to create a few more intents to handle common questions about the artwork like size, materials, location, etc. I’m restricting myself to only the information available on Wikipedia.
You can test out the conversation in the right hand side of the Dialogflow site, before integrating with Hootsy.
Creating the 3D Model
These are the steps to create the 3D artwork. I’m going to use Blender v2.8 (blender.org) since it’s free and I like the v2.8 interface as much as any of the paid 3D modeling programs.
For those new to Blender or 3D modeling in general, the tools can be challenging to get used to so here is the final model if you want to skip this section. I do recommend trying it out though as it’s a lot of fun once you get used to it.
I’m going to adjust the default cube to be 1m x 1m x 0.03m. Wikipedia provides the artwork dimensions and this size will allow me to easily scale height and width to match any artwork.
Next, I’m going to select the edges of the square facing us. With edges selected, hit ctrl+E and then select Mark Seam in the menu that pops up. Next hit ‘U’ to bring up menu and then select unwrap. Here is a view of the unwrapped UV.
For some reason, the texture ends up rotated 180 degrees when it’s added in a later step so we’ll fix that now. In the UV Editor view (bottom window in image above), hit the ‘a’ key to select all and then ‘r’ key to rotate. Type ‘180’ to rotate 180 degrees.
Now we’re going to add the materials and textures. First I’m going to change the base color of the cube to be black.
Then I’m going to create a new material, select the face of the cube that we marked seams for and then hit Assign (make sure you’re in Edit Mode if you don’t see the assign button).
Switch to Shader Editor as seen in the image below. Download the artwork image from the wikipedia article https://en.wikipedia.org/wiki/File:Roy_Lichtenstein_Drowning_Girl.jpg. Drop it into Shader Editor window and connect the color of that node to the Base Color of the Principled BSDF node. You should now see the image displayed on the model. If not, ensure you’re in the LookDev view to view texture.
We are now ready to export. We’re going to export in two formats: GTLF and FBX. GLTF is my preferred format for web and FBX imports best in Unity (although .blend files sometimes work too). Click File in the top menu, then export, then gltf (2.0). You can test the result by dragging the exported files here https://gltf-viewer.donmccurdy.com/. One thing you’ll notice is that the image quality of the artwork is less than ideal. We would likely want to improve this with higher quality images.
For exporting to FBX, set Apply Scale to FBX All and select Mesh to only export the mesh.
Creating the Voice Interactive Assistant
This is where we pull everything together. In Hootsy (https://hootsy.com), create a new assistant. I’m going to use one of the pre-built characters for simplicity, but you could build your own using Blender or another tool if you wanted. I’m also going to change the appearance for fun so it looks more like the girl in the artwork.
Next, I’m going to expand Conversation and change source to Dialogflow. I’m going to open up Dialogflow again, go to settings and then switch API version to v1 and copy the client access token (note: v1 will be deprecated later this year, but we plan to update to v2 before then). Go back to Hootsy, and paste this token into the Dialogflow access token field. You should now be able to interact with the assistant using that conversation.
Click add button next to Conversation Models and drag the recently exported Blender model. Select Display on Scene Load and then adjust the position/rotation as desired.
If you want to share this with others, select the Share button at the top of the left menu and copy the url. Note that the safari in-app browser does not currently provide mic access (see bug), so we provide a note directing people to view it in safari proper in those cases. Here is the experience I created: Drowning Girl Scene.
When you say ‘hello’, you’ll notice she waves to you. This was triggered through a custom message in Dialogflow. In Hootsy, you can see an animation defined for the assistant called ‘Waving’, which is triggered on request. In Dialogflow, I can trigger this through a custom payload as part of the welcome response. You can find more info on custom payloads in the Hootsy dev docs.
This web experience was built with Three.js. If you want to further customize this experience or deploy it within another Three.js experience, let us know. We plan to release the web SDK soon.
Deploying to Native App for AR
Quality AR experiences are not possible today on the web. There is progress towards making this possible, which you can learn about here, but for the near term, we’ll need to deploy to a native app to take full advantage of the AR capabilities with ARKit (for iOS) and ARCore (for Android).
Create a new folder under assets and call it ‘Resources’, Create a new folder under Resources and call it ‘Art’. Drag the FBX model and image into the Art folder. Select the model, and Extract Materials.
Select Material.002 and change Rendering Mode to Opaque. We will be focused on the AR scene, but we’ll open the VR example scene to more easily setup the art model. In the VR scene, create an empty object and call it the same name as the GLTF model name, which is likely the same name you used for the FBX model. Now drag the FBX model so it’s a child of the new object. For the child object, set Y rotation to 90, scale to 1.7 and cast shadows to off. Drag the parent object into the Art folder to create a prefab.
Open the AR example scene, select AR Session Origin and change Detection flags to Everything. Make sure you defined the Access Token and Assistant ID as detailed in the SDK readme. You should now be ready to build.
Go to File -> Build Settings. I’m going to build for iOS at this point, but you could also build for Android as we used Unity’s AR Foundation for multi-platform support.
Select iOS, select Switch Platform, and then select Build. Open the project in Xcode, define your Team, connect your iPhone or iPad and then run. When you run the AR scene, you’ll want to look around the floor until you find a ground plane and then tap on it to place the assistant and then find a plane on the wall and tap it to place the art. You could improve the experience of placing the assistant and artwork by using some of the assets in Unity’s AR Foundation UX example. You could also consider attaching the assistant to the screen so the only object that needs to be placed is the artwork.
Here are some of the ways this could be improved further:
- Scale to all articles: Extract wikipedia data using Mediawiki API. Example query. You could add a fulfillment to your conversation to retrieve this information before responding to a user. As part of this you could:
- Location aware: Request user’s location and use it to retrieve relevant content.
- Image Recognition: Allow user to take an image of something and use image recognition to classify and find relevant content.
- Improve Artwork Placement: Currently using vertical plane detection in AR, but if the wall is empty it’s sometimes difficult to find a plane. The touch control script could also be updated to allow movement along a vertical plane after placement.
- Shorter, better responses: Determine better ways to extract relevant content from wikipedia articles and provide responses to users.
If you made it this far, you rock! I’d love to know what you think. Please leave a comment or share with anyone who might find this interesting.