Knowing: A Generic Data Analysis Application

We got another demo accepted:

Knowing: A Generic Data Analysis Application

Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, Nepomuk Seiler, Christoph Türmer, Dieter Dill
To appear at 15th International Conference on Extending Database Technology (2012)
March 27-30, 2012, Berlin, Germany

Abstract:

Extracting knowledge from data is, in most cases, not restricted to the analysis itself but accompanied by preparation and post-processing steps. Handling data coming directly from the source, e.g. a sensor, often requires preconditioning like parsing and removing irrelevant information before data mining algorithms can be applied to analyze the data. Stand-alone data mining frameworks in general do not provide such components since they require a specified input data format. Furthermore, they are often restricted to the available algorithms or a rapid integration of new algorithms for the purpose of quick testing is not possible. To address this shortcoming, we present the data analysis framework Knowing, which is easily extendible with additional algorithms by using an OSGi compliant architecture. In this demonstration, we apply the Knowing framework to a medical monitoring system recording physical activity. We use the data of 3D accelerometers to detect activities and perform data mining techniques and motion detection to classify and evaluate the quality and amount of physical activities. In the presented use case, patients and physicians can analyze the daily activity processes and perform long term data analysis by using an aggregated view of the results of the data mining process. Developers can integrate and evaluate newly developed algorithms and methods for data mining on the recorded database.

BibTex

@INPROCEEDINGS{BerGraKriSeietal12,
  AUTHOR     = {T. Bernecker and F. Graf and H.-P. Kriegel and N. Seiler and C. Tuermer and D. Dill},
  TITLE      = {Knowing: A Generic Data Analysis Application},
  BOOKTITLE  = {Proceedings of the 15th International Conference on Extending Database Technology (EDBT), Berlin, Germany},
  YEAR       = {2012}
}

More informations will be published at the official publication site at the LMU.

Finished my Posters for ICIP and MICCAI

Finally finished the posters for my publications:

F. Graf, H.-P. Kriegel, M. Schubert, S. Poelsterl, A. Cavallaro
2D Image Registration in CT Images using Radial Image Descriptors
In Medical Image Computing and Computer-Assisted Intervention (MICCAI), Toronto, Canada, 2011.

and

F. Graf, H.-P. Kriegel, M. Weiler
Robust Segmentation of Relevant Regions in Low Depth of Field Images
In Proceedings of the IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, 2011.

Maximum Gain Round Trips with Cost Constraints

The idea is the following: Finding the shortest/fastes path from A to B is rather exploited. But if you start a hike, knowing that you want to spend 4 hours and then come back to the starting point. Then the problem suddenly starts to become a bit complex (NP-hard to be honest if you do not add any constraints).

We propose a solution to do this kind of search a bit more efficient. but don’t expect linear search time 😉 And – in contrast to quite some other research – we are operating on REAL data obtained from OpenStreetMap.

Abstract:

Searching for optimal ways in a network is an important task in multiple application areas such as social networks, co-citation graphs or road networks. In the majority of applications, each edge in a network is associated with a certain cost and an optimal way minimizes the cost while fulfilling a certain property, e.g connecting a start and a destination node. In this paper, we want to extend pure cost networks to so-called cost-gain networks. In this type of network, each edge is additionally associated with a certain gain. Thus, a way having a certain cost additionally provides a certain gain. In the following, we will discuss the problem of finding ways providing maximal gain while costing less than a certain budget. An application for this type of problem is the round trip problem of a traveler: Given a certain amount of time, which is the best round trip traversing the most scenic landscape or visiting the most important sights? In the following, we distinguish two cases of the problem. The first does not control any redundant edges and the second allows a more sophisticated handling of edges occurring more than once. To answer the maximum round trip queries on a given graph data set, we propose unidirectional and bidirectional search algorithms. Both types of algorithms are tested for the use case named above on real world spatial networks.

Documents

At our project site you can find:

Bibtex

@TECHREPORT{GraKriSchu11,
  AUTHOR      = {F. Graf and H.-P. Kriegel and M. Schubert},
  TITLE       = {Maximum Gain Round Trips with Cost Constraints},
  INSTITUTION = {Institute for Informatics, Ludwig-Maximilians-University, Munich, Germany},
  YEAR        = {2011},
  LINK        = {http://arxiv.org/abs/1105.0830v1}
}

Robust Segmentation of Relevant Regions in Low Depth of Field Images

Great, we got accepted (as a poster) on the ICIP 2011 with the paper “Robust Segmentation of Relevant Regions in Low Depth of Field Images”:

Low depth of field (DOF) is an important technique to emphasize the object of interest (OOI) within an image. When viewing a low depth of field image, the viewer implicitly segments the image into region of interest and non regions of interest which has major impact on the perception of the image. Thus, robust algorithms for the detection of the OOI in low DOF images provide valuable information for subsequent image processing and image retrieval. In this paper we propose a robust and parameterless algorithm for the fully automatic segmentation of low depth of field images. We compare our method with three similar methods and show the superior robustness even though our algorithm does not require any parameters to be set by hand. The experiments are conducted on a real world data set with high and low depth of field images. (Abstract from the paper)

The work is a result of a collaboration with Michael Weiler. We extended his Diploma thesis and produced an improved segmentation algorithm for Low Depth Of Field images. Compared to the other 3 competing algorithms, ours is a bit slower but at least it works. The other algorithms turned out to be extremely unstable and/or sensitive to parameters.

On the project site you can find

  • an online demo
  • the test images,
  • the masks
  • the NetBeans project including the full Java source code for our algorithm and the reimplementation of the comparison partners (of course we had to re-implement as we didn’t even get binaries – as usual)

So if you plan to do some image segmentation, just go there download the stuff and cite our work 😉

Fully automatic detection of the vertebrae in 2D CT images – the Talk

Yea finally I gave the talk for my Publication “Fully automatic detection of the vertebrae in 2D CT images” Paper 7962-11 at SPIE Medical Imaging 2011, Conference 7962 Image Processing (see index) in front of about 200 people.

Everything went fine. Just some nice questions right after the talk and some hints afterwards. Hey – some guys even remembered the talk 2 days later! 🙂

Thanks, SPIE Medical Imaging.

Fully automatic detection of the vertebrae in 2D CT images

FINALLY submitted the paper which we sent to the SPIE Medical Imaging Conference  (and got accepted!)

Abstract:

Knowledge about the vertebrae is a valuable source of information for several annotation tasks. In recent years, the research community spent a considerable effort for detecting, segmenting and analyzing the vertebrae and the spine in various image modalities like CT or MR. Most of these methods rely on prior knowledge like the location of the vertebrae or other initial information like the manual detection of the spine. Furthermore, the majority of these methods require a complete volume scan. With the existence of use cases where only a single slice is available, there arises a demand for methods allowing the detection of the vertebrae in 2D images. In this paper, we propose a fully automatic and parameterless algorithm for detecting the vertebrae in 2D CT images. Our algorithm starts with detecting candidate locations by taking the density of bone-like structures into account. Afterwards, the candidate locations are extended into candidate regions for which certain image features are extracted. The resulting feature vectors are compared to a sample set of previously annotated and processed images in order to determine the best candidate region. In a final step, the result region is readjusted until convergence to a locally optimal position. Our new method is validated on a real world data set of more than 9 329 images of 34 patients being annotated by a clinician in order to provide a realistic ground truth.

More Information and the paper as PDF can be found at my publication site.

Impact of Flash SSDs on Spatial Indexing

Neben Paros war ich auf der SigMod 2010 auch mit einer Veröffentlichung auf dem DaMoN Workshop (Data Management On new Hardware) vertreten. Der Titel der Veröffentlichung war “On the Impact of Flash SSDs on Spatial Indexing“.

Dass SSDs schnell und cool sind, ist ja mittlerweile allseits bekannt. Leider sind sie noch etwas teuer – nichts desto trotz haben sie den unglaublichen Vorteil, dass die Zugriffszeiten bei wahlfreien Lesezugriffen deutlich kleiner sind, da sie nahezu keine Seek-Zeiten aufweisen (also die Zeit in der der Lesearm der regulären Festplatte erst zur richtigen Position gefahren werden muss, bevor gelesen oder geschrieben werden kann). Diese Seek-Zeiten sind gerade bei Datenbankindices einer der bremsenden Faktoren. Da unser Lehrstuhl auf Datenbanken, DataMining und Indexing spezialisiert ist, haben wir uns daher angesehen, wie sich der R*-Baum (einer der Standard-Indices für mehrdimensionale Daten) auf SSDs im Vergleich zu HDDs verhält. Insbesondere, wie die Indices skalieren und wie es mit der Performanz aussieht, wenn höher dimensionale Daten ( > 10 Dimensionen bzw. Spalten) mit einem R*-Baum indexiert werden sollen und wie es sich mit dem Fluch der Dimensionalität verhält.

Fazit: War eine doch recht spannende und vor allem recht praktische Arbeit , da wir nicht einfach nur Modelle durchrechnen wollten, sondern wirklich die Zugriffe auf die Platte sehen und messen wollten – was die Tests auch sehr zeitaufwändig gemacht hat, da Ram und Platte eben doch einen “kleinen” Geschwindigkeitsunterschied haben.

Abstract:

Similarity queries are an important query type in multimedia databases. To implement these types of queries, database systems often use spatial index structures like the R*-Tree. However, the majority of performance evaluations for spatial index structures rely on a conventional background storage layer based on conventional hard drives. Since newer devices like solid-state-disks (SSD) have a completely different performance characteristic, it is an interesting question how far existing index structures proft from these modern storage devices. In this paper, we therefore examine the performance behaviour of the R*-Tree on an SSD compared to a conventional hard drive. Testing various in uencing factors like system load, dimensionality and page size of the index our evaluation leads to interesting insights into the performance of spatial index structures on modern background storage layers.

Die Veröffentlichung kann man auf unserer Website herunterladen: On the Impact of Flash SSDs on Spatial Indexing