Log in

Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of ‘tool-tips’ and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets/USPTO+Algorithm+Challenge%2C+run+by+NASA-Harvard+Tournament+Lab+and+TopCoder++++Problem%3A+Pat.

  2. Embedded within this competition was a social science experiment to investigate different team formation mechanisms. Two treatments were implemented. In treatment one, teams were formed through bilateral agreement between participants after communicating through a public forum or private direct messaging (this was termed the ‘free-form’ treatment). In the second treatment, teams were formed based on a stable-matching algorithm using participants’ stated preferences (termed ‘algorithm’ treatment). We found no significant differences in algorithm performance between the two treatments. The exact details of the social science experiment are beyond the scope of this paper. Some preliminary results can be found in this working paper http://goo.gl/NjoWce.

  3. http://labelme.csail.mit.edu/Release3.0/.

  4. The final ranking of all submissions is publicly available on the TopCoder Web site at https://community.topcoder.com/longcontest/stats/?&sr=1&nr=50&module=ViewOverview&rd=15027.

References

  1. Archak, N.: Money, glory and cheap talk: analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on topcoder.com. In: Proceeding of the International Conference World Wide Web, pp. 21–30 (2010)

  2. Barney Smith, E., Belaid, A., Kise, K. (eds.): Proceedings of the International Conference Document Analysis and Recognition. IEEE Computer Society, Washington, DC (2013)

    Google Scholar 

  3. Bhatti, N., Hanbury, A.: Image search in patents: a review. Int. J. Doc. Anal. Recognit. 16(4), 309–329 (2013)

    Article  Google Scholar 

  4. Blumenstein, M., Pal, U., Uchida, S. (eds.): Proceedings of the International Work. Document Analysis Systems. IEEE Computer Society, Gold Coast, Australia (2012)

    Google Scholar 

  5. Boudreau, K.J., Lacetera, N., Lakhani, K.R.: Incentives and problem uncertainty in innovation contests: an empirical analysis. Manag. Sci. 57(5), 843–863 (2011)

    Article  Google Scholar 

  6. Boudreau, K.J., Lakhani, K.R.: Using the crowd as an innovation partner. Harv. Bus. Rev. 91(4), 61–69 (2013)

    Google Scholar 

  7. Bukhari, S.S., Shafait, F., Breuel, T.M.: Coupled snakelets for curled text-line segmentation from warped document images. Int. J. Doc. Anal. Recognit. 16(1), 33–53 (2013)

    Article  Google Scholar 

  8. Casey, R., Lecolinet, E.: Strategies in character segmentation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)

    Article  Google Scholar 

  9. Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognit. 34(8), 1671–1684 (2001)

    Article  MATH  Google Scholar 

  10. Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D., Ng, A.: Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 440–445. Bei**g, China (2011)

  11. Do, T.H., Tabbone, S., Ramos-Terrades, O.: Text/graphic separation using a sparse representation with multi-learned dictionaries. In: Proceedings of the International Conference Pattern Recognition, pp. 689–692. Tsukuba, Japan (2012)

  12. Doermann, D., Tombre, K. (eds.): Handbook of Document Image Processing and Recognition, vol. 2. Springer, New York (2014)

    MATH  Google Scholar 

  13. Dori, D., Wenyin, L.: Automated CAD conversion with the machine drawing understanding system: concepts, algorithms, and performance. IEEE Trans. Syst. Man Cybern. A 29(4), 411–416 (1999)

    Article  Google Scholar 

  14. D’Ulizia, A., Ferri, F., Grifoni, P.: A survey of grammatical inference methods for natural language learning. Artif. Intell. Rev. 36(1), 1–27 (2011)

    Article  Google Scholar 

  15. Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2–3), 66–86 (2006)

    Article  Google Scholar 

  16. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)

  17. Fletcher, L., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 910–918 (1988)

    Article  Google Scholar 

  18. Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.): Information Access Evaluation. Multilinguality, Multimodality, and Visualization—4th International Conference of the CLEF Initiative, Lecture Notes in Computer Science, vol. 8138. Springer, Valencia (2013)

  19. Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey—part I. IEEE Trans. Syst. Man Cybern. 5(1), 95–111 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  20. Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey—part II. IEEE Trans. Syst. Man Cybern. 5(4), 409–423 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  21. Fullerton, R.L., McAfee, R.P.: Auctioning entry into tournaments. J. Polit. Econ. 107(3), 573–605 (1999)

    Article  Google Scholar 

  22. Gobeill, J., Teodoro, D., Pasche, E., Ruch, P.: Report on the TREC 2009 experiments: chemical IR track. In: Text Retrieval Conference (TREC’09) (2009)

  23. Howe, J.: Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Crown, New York (2008)

    Google Scholar 

  24. Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)

    Article  Google Scholar 

  25. Kalil, T., Sturm, R.: Congress grants broad prize authority to all federal agencies (2010). http://wh.gov/OSw

  26. Kanungo, T., Haralick, R., Dori, D.: Understanding engineering drawings: a survey. In: Proceedings of Work. Graphics Recognition, pp. 217–228 (1995)

  27. Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email). In: Proceedings of the International Conference Document Analysis and Recognition, pp. 1485–1490 (2011)

  28. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 1484–1493 (2013)

  29. Koo, H., Kim, D., et al.: Scene text detection via connected component clustering and non-text filtering. IEEE Transaction Image Processing, pp. 2296–2305 (2013)

  30. Lai, C., Kasturi, R.: Detection of dimension sets in engineering drawings. IEEE Trans. Pattern Anal. Mach. Intell. 16(8), 848–855 (1994)

    Article  Google Scholar 

  31. Lakhani, K.R., Boudreau, K.J., Loh, P.R., Backstrom, L., Baldwin, C., Lonstein, E., Lydon, M., MacCormack, A., Arnaout, Ra, Guinan, E.C.: Prize-based contests can provide solutions to computational biology problems. Nat. Biotechnol. 31(2), 108–111 (2013)

    Article  Google Scholar 

  32. Lamiroy, B., Lopresti, D.: An open architecture for end-to-end document analysis benchmarking. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 42–47. Bei**g, China (2011)

  33. Liang, J., Doermann, D.S., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)

    Article  Google Scholar 

  34. Lu, T., Tai, C.L., Yang, H., Cai, S.: A novel knowledge-based system for interpreting complex engineering drawings: theory, representation, and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1444–1457 (2009)

    Article  Google Scholar 

  35. Lu, Z.: Detection of text regions from digital engineering drawings. IEEE Trans. Pattern Anal. Mach. Intell. 20(4), 431–439 (1998)

    Article  Google Scholar 

  36. Lupu, M., Hanbury, A.: Patent retrieval. Found. Trends Inf. Retr. 7(1), 1–97 (2013)

    Article  Google Scholar 

  37. Mervis, J.: Agencies rally to tackle big data. Science 336(6077), 22 (2012)

    Article  Google Scholar 

  38. Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)

    Article  Google Scholar 

  39. Nagy, G., Embley, D.W., Krishnamoorthy, M.S., Seth, S.C.: Clustering header categories extracted from web tables. In: Ringger, E.K., Lamiroy, B. (eds.) Document Recognition and Retrieval XXII, Proceedings of SPIE, vol. 9402, p. 94020M. San Francisco (2015)

  40. Nelson, R.R.: Uncertainty, learning, and the economics of parallel research and development efforts. Rev. Econ. Stat. 43(4), 351–364 (1961)

    Article  Google Scholar 

  41. Niemeijer, M., Van Ginneken, B., Cree, M., Mizutani, A., Quellec, G., Sanchez, C., Zhang, B., Hornero, R., Lamard, M., Muramatsu, C.: Others: retinopathy online challenge: automatic detection of microaneurysms in digital color fundus photographs. IEEE Trans. Med. Imaging 29(1), 185–195 (2010)

    Article  Google Scholar 

  42. Ogier, J.M. (ed.): Proceedings of the International Work. Graphics Recognition (GREC 2013), Lecture Notes in Computer Science, vol. 8746. Springer, Bethlehem, PA (2014)

  43. O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)

    Article  Google Scholar 

  44. Rice, S.V., Jenkins, F.R., Nartker, T.A.: The fifth annual test of OCR accuracy. Information Science Research Institute (1996)

  45. Rice, S.V., Nagy, G.L., Nartker, T.A.: Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer, New York (1999)

    Book  Google Scholar 

  46. Roy, P.P., Pal, U., Lladós, J.: Text line extraction in graphical documents using background and foreground information. Int. J. Doc. Anal. Recognit. 15(3), 227–241 (2012)

    Article  Google Scholar 

  47. Rusiñol, M., de las Heras, L., Ramos, O.: Flowchart recognition for non-textual information retrieval in patent search. Inf. Retr. 17(5–6), 545–562 (2014)

    Article  Google Scholar 

  48. Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)

    Article  Google Scholar 

  49. Sadawi, N.M., Sexton, A.P., Sorge, V.: Performance of MolRec at TREC 2011—overview and analysis of results. In: The Twentieth Text REtrieval Conference Proceedings (TREC). National Institute of Standards and Technology (NIST), USA (2011)

  50. Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: Proceedings of International Conference Document Analysis and Recognition, pp. 1491–1496 (2011)

  51. Simon, H., Newell, A.: Computer simulation of human thinking and problem solving. Monogr. Soc. Res. Child Behav. 27, 137–150 (1962)

    Article  Google Scholar 

  52. Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the International Conference Document Analysis and Recognition, vol. 2, pp. 629–633. Curitiba, Brazil (2007)

  53. Sturm, R.: New center for excellence fuels prize to help modernize tools for patent examination (2011). http://wh.gov/DdM

  54. Tassey, G., Rowe, B.R., Wood, D.W., Link, A.N., Simoni, D.A.: Economic impact assessment of NIST’s text REtrieval conference (TREC) program. National Institute of Standards and Technology (2010)

  55. Terwiesch, C., Ulrich, K.T.: Innovation Tournaments: Creating and Selecting Exceptional Opportunities. Harvard Business Press, Boston (2009)

    Google Scholar 

  56. Terwiesch, C., Xu, Y.: Innovation contests, open innovation, and multiagent problem solving. Manag. Sci. 54(9), 1529–1543 (2008)

    Article  Google Scholar 

  57. Tombre, K., Tabbone, S., Pélissier, L., Lamiroy, B., Dosch, P.: Text/graphics separation revisited. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) Document Analysis Systems, Lecture Notes in Computer Science, vol. 2423, pp. 200–211. Springer, Berlin (2002)

    Google Scholar 

  58. Valveny, E., Lamiroy, B.: Scan-to-XML: automatic generation of browsable technical documents. In: Proceedings of the International Conference Pattern Recognition, vol. 3, pp. 188–191. Québec City, Canada (2002)

  59. Wagner, R., Fischer, M.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  60. Wendling, L., Tabbone, S.: A new way to detect arrows in line drawings. IEEE Trans. Pattern Anal. Mach. Intell. 26(7), 935–941 (2004)

    Article  Google Scholar 

  61. Wu, V., Manmatha, R., Riseman, E.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. 21(11), 1224–1229 (1999)

    Article  Google Scholar 

  62. Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)

    Article  Google Scholar 

  63. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recognit. 7(1), 1–16 (2004)

    Article  Google Scholar 

  64. Zanibbi, R., Blostein, D., Cordy, J.R.: White-box evaluation of computer vision algorithms through explicit decision-making. Computer Vision Systems. Lecture Notes in Computer Science, vol. 5815, pp. 295–304. Springer, Liège, Belgium (2009)

  65. Zheng, Y., Li, H., Doermann, D.: Machine printed text and handwriting identification in noisy document images. IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 337–353 (2004)

    Article  Google Scholar 

  66. Zhou, W., Li, H., Lu, Y., Tian, Q.: Principal visual word discovery for automatic license plate detection. IEEE Trans. Image Process. 21(9), 4269–4279 (2012)

    Article  MathSciNet  Google Scholar 

  67. Zhu, S., Zanibbi, R.: Label detection and recognition for USPTO images using convolutional k-means feature quantization and AdaBoost. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 633–637. Washington, DC (2013)

Download references

Acknowledgments

We are grateful for helpful comments provided by Ahmad Ahmad and the anonymous reviewers. This research was supported in part by the NASA Tournament Laboratory and the United States Patent and Trademark Office (USPTO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christoph Riedl.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riedl, C., Zanibbi, R., Hearst, M.A. et al. Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms. IJDAR 19, 155–172 (2016). https://doi.org/10.1007/s10032-016-0260-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-016-0260-8

Keywords

Navigation