Search

Urania Berlin, June 6-7, 2011

First 5 Videos

We will be releasing 5 videos of #bbuzz 2011 talks every Monday during the next weeks. The first 5 are:

Daniel Trümper ZeroMQ, Felix Geisendörfer Node JS, Otis Gospodnetić SEARCH ANALYTICS WHAT WHY HOW, Mathias Meyer NoSQL Past Present Future, and Matthias Wessendorf Web Sockets. Click here to view all five videos...

Daniel Trümper ZeroMQ from ntc GmbH on Vimeo.

Felix Geisendörfer Node JS from ntc GmbH on Vimeo.

Otis Gospodnetić SEARCH ANALYTICS WHAT WHY HOW from ntc GmbH on Vimeo.

Interview: Otis Gospodnetić
by Sebastian Arnold, TU Berlin

Q: Hello Otis, you just gave a talk about Search Analytics. What is your main work behind that and why do you think this is important?

A: I am the founder of Sematext, we're focused on the development of scalable search and analytics. We are consulting companies to improve their search services by analyzing the user's behaviour on the site. This is related to web analytics, but the reports are based on much more data and include knowledge about the site and its content. It's interesting to see how few people actually use search analytics. It is important to know if people are really finding what they need and if they are happy with the search results. You can't tell that from web server log files only.

Q: So, I've seen you are collecting a large amount of logs about clicks and queries on the site, you basically try to monitor "everything" that happens there. You analyze that data and generate reports about the usage of different site functions and rates of search failure etc. But all of this happens on single transactions. Is there a possibility to combine longer click paths of a user's search intention and analyze the whole navigation sequence?

A: That's fuzzy. The problem is what I refered to as "search sessions" in my talk. You can try to group the log lines by user and then sort by time to get a sequential click log. But how can you tell if the user is still following the same search trail? Maybe he already gave up on one topic and now tries something completely different. You could try to cluster by similarity, for example to find different spellings of the same search. But to really find sessions you have to know more about the relations of his search inputs.

Q: I'm working on something like this at the moment. Our data base is very structured and so we can tell if two results are somehow related. Then, we're writing more detailed metadata about the origin of an event in the logs. We instantly see if the user stays on the same object, object type or general topic. This helps us to find the boundaries of a user's search intention in the clickstream.

A: That's interesting. But I think you wouldn't find many matches for the same actions of different users. And it's still hard on a site with a lot of traffic.

Q: You're right. What about doing this on a higher level, e.g. on the navigational structure of a site? We can then try to find typical paths like "search" -> "not found" -> "re-search".

A: Yeah, of course it somehow is possible. Especially if you have more information about the data itself. I just haven't done this yet.

Q: Alright. Now we're finally at the end of the lunch queue. Thank you for your time, Otis.

A: You're welcome - thanks for the nice talk. See you later!

Mathias Meyer NoSQL Past Present Future from ntc GmbH on Vimeo.

Matthias Wessendorf Web Sockets from ntc GmbH on Vimeo.

Bookmark/Search this post with

jge's blog
Login to post comments

Berlin Buzzwords 2011 is a conference for developers and users of open source software projects, focussing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. Berlin Buzzwords presents more than 30 talks and presentations of international speakers specific to the three tags "search", "store" and "scale".